Archive for the 'CS' Category

Canned Responses

Topic: rant, turing machine, Hacks, CS, article| 1 Comment »

This is where I bitch about serialization in C++. If you do not understand what that previous statement was about, you might want to skip all of this entry. Really, skip this entry, it does not help you with much. Yes, I am home for the winter, I am leaving in a couple of days. It was good to be home, this is my last winter break for a while. But really, this entry is about serialization, maybe somewhat about reflection, but mostly about serialization. Leave it alone, it doesn't link to that much even.

Before I begin bitching, there are a couple of things that need to be put out. Yes I am aware of the existence such things as s11n, and the Boost Serialization library. I have even spent a few days looking at such things as Thrift, and the promise of writing something like an IDL, passing it through a compiler, producing C++ code which then serializes itself. Really this is a great idea, and if I was worried about writing serialization code for myself I'd have picked one of those. Overall, I would probably have gone with s11n for things I was adapting to be serializable, and Thrift for everything else. But then again, I could also use Python, Java, C# or any of the dozen languages which provide nice serialization interfaces.

The thing is I am not writing code for myself, well I am, but not really. I am trying to see if this class I am TAing can be made doable in C++. I don't like all of those solutions for various reasons, listed forthwith.

  1. s11n, boost: I think any serialization depending on me completely specifying what needs to be serialized in code has issues. In short, there are times when data fields are added to classes, and people forget to change code. This is somewhat panful to debug (think of forgetting to initialize stuff in a constructor and such), adding another place where you need to worry about this seems like a poor decission.
  2. Thrift, ...: Learning another language, merely for the privilege of serialization seems a little counterproductive. Besides why pick Thrift's IDL over any of the other serializable language. After all managed code is supposed to become as fast as native code, give or take a little

No sir, I do not like either of those choices, and for once in my life, I find myself wanting something lots of languages have, in C++, because pig headedly this is the Java alternate some of us backed.

s11n's author aptly states that serialization is easy, it is deserialization that really sucks. Assuming we are not going down blind alleys (as I once was), packing data in a known format, with know boundaries, is a fairly reasonable strategy of assuring oneself that the data seen on the receiving end did indeed originate from a compatible library. You ask what of data which has boundaries and properties identical to what you're expecting, but is actually corrupt. Well no matter how much you worry about solving problems involving arbitrary data, one must always remember that provably showing equivalence between two generators of data is unsolvable.

Now the only real reason deserialization is so hard, is because in some cases you want people to be able to encapsulate arbitrary data in objects belonging to your class. Arbitrary data is somewhat hard to dynamically recreate. And by somewhat I mean a lot.

Let us assume then that we neither want to learn an IDL, nor do we want to face the problems I feel exist with using libraries like s11n. However, we also recognize the fact that s11n or boost is probably much better done than anything we can produce, and perfect in a reasonable amount of time (a few hours in particular). Now if we had reflection, or what some fancy sites seem to call meta-object programming, we could go through the process of producing code which automatically serializes all data fields in an object. Well C++ doesn't really have reflection, and C++'s version of RTTI is not very useful for these purposes.

But we are stubborn, pig headedness always comes with stubbornness. Well if we had multi-pass compilers, we could use C++'s macro system (which is arguably not very powerful), to build ourselves a reflection framework, and then use said reflection framework in the way mentioned above. After all, we know that C/C++ macros are mere text replacement tools, and hence it makes perfect sense to write #defines which produce other #defines. Of course the C++ compiler is single pass, probably for many many reasons, including the avoidance of infinite loops. Ah the avoidance of infinite loops, always a worthy cause.

Well I am lost. I have ideas, none of them seem to work. Writing serialization code by hand seems to be very very painful, and is one of those things I know is going to come back to haunt me later this semester. We need better alternates to Java. And other things...

Panda

C++ Hating

Topic: Hacks, CS| No Comments »

I believed I had this beautiful set of numbers. Then reality struck and they weren't actually there. On one hand, having fake but nice numbers is bad, on the other hand, those numbers were really pretty. And now I sit around reading and thinking about Haskell and hating on C++. Who would have known that j would differ in

 
  std::vector<int> something[numelts];
  something[7].push_back(2);
  int j = something[7][0];
 

and

 
  std::vector<int> something[numelts];
  something[7].push_back(2);
  int j = (something[7])[0];
 

.
The later is correct, the first is wrong, C++ seems to be treating unparenthisized things as normal array lookups, while the parenthisized thing leads to it using operator[] from vector.

panda

Mercurial, filesystems

Topic: systems, CS| No Comments »

This semester as a part of TAing 167/9 we are making use of Mercurial for SCM. Itay liked Hg so much that he actually started using it for nearly everything on his computer and came up with a good reason for doing this. I am not organized enough to put all of my files into source control (though heavens knows I probably should), but I actually like the idea of using mercurial to keep my two copies of research code synchronized.

Read the rest of this entry »

Parser Combinators, Scala, Haskell

Topic: monads, parser combinators, scala, haskell, CS| No Comments »

Earlier today, thanks to Itay, I was introduced to the idea of parser combinators in conjunction with a discussion we were having. From what I understood (hence not misrepresenting anyone), he knew of parser combinators from Scala, where they are a part of the standard library, of course my customary Google search of new terms led me to documentation for the Haskell Parser Generator, which in itself is mildly different from the Scala equivalent, understandably so seeing how the principles behind both languages differ. I have been trying to get a better grasp of both languages for a while, and though I have been doing this for widely divergent reasons (Haskell - strong links to math, the cool factor, reading a decent paper on monads on the recommendation of a couple of people, and not knowing how to use it, Scala - lots of chatter about it coming my way from Itay, the entire actor model, a bunch of interesting documentation), I have been failing miserably, since while I know what both involve to a first approximation, not writing any large chunk of code in either implies not being used to the idiosyncrasies of either, and not being able to parse the language well in one's head (it took me a 40 minute train ride home to figure out to a first approximation what some of the code in here means). So I am going to try learning parser combinators in one of the languages (possibly Scala since it would gel with something I am doing with Itay), and then eventually convert what I learn in one into the other, and hope that works. Either ways from what little I read of parser combinators they seem like an interesting model for parsers, at least the scala implementation seems to be fairly nifty in terms of being literally readable as a grammar.

Panda

STM, Topology

Topic: topology, stm, CS| No Comments »

This is one of two or three short posts I will be making today, and will as such probably never be seen as a result of not actually having visible text for very long. A casual glance through programming.reddit.com will show a recent surge in interest in Software Transactional Memory. Having missed most of the concurrency train (I still have to read the Erlang book), I am pretty amused by what all this concurrency means. I think the processes themselves are fairly interesting, but I claim to be entirely inept at it, I understand what STM means, what it entails, I haven't used it. However what does interest me about STM is a couple of papers written by Herlihy (there are others, I just happened to run across these while going through what Brown professors did) relating concurrency to topology. The papers are freely available on the website, and while I am still working on actually reading them, I am hoping posting here will shame me into actually completing this eventually. On the positive side, trying to parse the papers has meant having to read and understand a book on point set topology, one of the things I did in the hazy days of my sophomore year and have forgotten fairly successfully. Of course seeing as parsing math texts themselves takes a while, and paper and pen, this is slow going.

Panda