Archive for the ‘turing machine’ Category

Canned Responses

Wednesday, January 16th, 2008

This is where I bitch about serialization in C++. If you do not understand what that previous statement was about, you might want to skip all of this entry. Really, skip this entry, it does not help you with much. Yes, I am home for the winter, I am leaving in a couple of days. It was good to be home, this is my last winter break for a while. But really, this entry is about serialization, maybe somewhat about reflection, but mostly about serialization. Leave it alone, it doesn't link to that much even.

Before I begin bitching, there are a couple of things that need to be put out. Yes I am aware of the existence such things as s11n, and the Boost Serialization library. I have even spent a few days looking at such things as Thrift, and the promise of writing something like an IDL, passing it through a compiler, producing C++ code which then serializes itself. Really this is a great idea, and if I was worried about writing serialization code for myself I'd have picked one of those. Overall, I would probably have gone with s11n for things I was adapting to be serializable, and Thrift for everything else. But then again, I could also use Python, Java, C# or any of the dozen languages which provide nice serialization interfaces.

The thing is I am not writing code for myself, well I am, but not really. I am trying to see if this class I am TAing can be made doable in C++. I don't like all of those solutions for various reasons, listed forthwith.

  1. s11n, boost: I think any serialization depending on me completely specifying what needs to be serialized in code has issues. In short, there are times when data fields are added to classes, and people forget to change code. This is somewhat panful to debug (think of forgetting to initialize stuff in a constructor and such), adding another place where you need to worry about this seems like a poor decission.
  2. Thrift, ...: Learning another language, merely for the privilege of serialization seems a little counterproductive. Besides why pick Thrift's IDL over any of the other serializable language. After all managed code is supposed to become as fast as native code, give or take a little

No sir, I do not like either of those choices, and for once in my life, I find myself wanting something lots of languages have, in C++, because pig headedly this is the Java alternate some of us backed.

s11n's author aptly states that serialization is easy, it is deserialization that really sucks. Assuming we are not going down blind alleys (as I once was), packing data in a known format, with know boundaries, is a fairly reasonable strategy of assuring oneself that the data seen on the receiving end did indeed originate from a compatible library. You ask what of data which has boundaries and properties identical to what you're expecting, but is actually corrupt. Well no matter how much you worry about solving problems involving arbitrary data, one must always remember that provably showing equivalence between two generators of data is unsolvable.

Now the only real reason deserialization is so hard, is because in some cases you want people to be able to encapsulate arbitrary data in objects belonging to your class. Arbitrary data is somewhat hard to dynamically recreate. And by somewhat I mean a lot.

Let us assume then that we neither want to learn an IDL, nor do we want to face the problems I feel exist with using libraries like s11n. However, we also recognize the fact that s11n or boost is probably much better done than anything we can produce, and perfect in a reasonable amount of time (a few hours in particular). Now if we had reflection, or what some fancy sites seem to call meta-object programming, we could go through the process of producing code which automatically serializes all data fields in an object. Well C++ doesn't really have reflection, and C++'s version of RTTI is not very useful for these purposes.

But we are stubborn, pig headedness always comes with stubbornness. Well if we had multi-pass compilers, we could use C++'s macro system (which is arguably not very powerful), to build ourselves a reflection framework, and then use said reflection framework in the way mentioned above. After all, we know that C/C++ macros are mere text replacement tools, and hence it makes perfect sense to write #defines which produce other #defines. Of course the C++ compiler is single pass, probably for many many reasons, including the avoidance of infinite loops. Ah the avoidance of infinite loops, always a worthy cause.

Well I am lost. I have ideas, none of them seem to work. Writing serialization code by hand seems to be very very painful, and is one of those things I know is going to come back to haunt me later this semester. We need better alternates to Java. And other things...

Panda

Turing Machines and 7-Tuples

Thursday, August 2nd, 2007

Edit: Now updated with link to picture

This has been bothering me since yesterday, and were it not for people in the CS department e-mailing me stuff, it would still bother me. Nothing very deep about this, but it's an interesting difference in notation.

Yesterday afternoon, EAS, my cubemate, was looking at a Facebook photograph of someone who had the 7-tuple formal representation for a Turing machine tattooed on their body. That's OK, people tend to have interesting tattoos. What threw me off about this picture however was the fact that the tuple looked different from what my memory was serving up, and this is bad since one does hope that nearly a semester's worth of sitting in on classes, and a semester's worth of TAing a course would at least leave me with a not so wrong memory of what these tuples are, even though admittedly they don't show up all that often on assignments or such.

(more...)