Friday, August 31, 2007

Handling Parallel Declaration with Boxing/Unboxing

Note: my definition of "boxing" has nothing to do with the term used in languages with a primative/object dichotomy. I use it to refer to treating a tuple as a non-tuple.

I knew that type systems were hard. I knew that name lookup can be hard. My language does not have many special cases. It has simple syntax and semantics. I thought that I had made choices that would make implementation easier. I had, for the most part. However complexity, like code, is a gas. Rather than consuming all memory it consumes all brainpower. It seems like every project that I attempt grows beyond my abilities. I am still doing well on my game though. I work on it maybe 10 hours a month but that time is generally very productive.

So here is my problem: I had it so that a declaration was visible to an expression if it was not after the expression. I had it so that a definition is usable by an expression if the definition was before the expression. So given the following expression, where A-D are arbitrary subexpressions:

A;
B,C;
D

Then D can use anything defined in A, B, and C. A cannot use anything defined in B, C, or D. B can use variables declared in A and can see variables defined in C. C can do the same with respect to B. So in expression B you can declare a pointer to a variable defined in C, but you cannot use its value. This is a little like using forward declarations in C++, except that you are not allowed to separated declaration/definition.

Ok so to implement this the compiler constructs a directed graph and there are operations to determine if a node is before or after or neither before or after another node. No problem. However there is now a problem. The boxing/unboxing of variables turns this from a less-than greater-than problem into a pathfinding problem. I am writing a non-optimizing compiler for a simple (hmm.... am I just deluded about this?) language. I do not think that name lookup should involve pathfinding, searching up a tree, sure. However I made my language and now I have to implement it. I wish that I had made a bed instead.

This gets my thinking about attribute flow. I have this great book, called Programming Language Pragmatics. I am sure that the high-faluting language theorists and the guys who write optimizing compilers will both look down on it, but it is a great book language design newbies wanting to make a language. So anyway, this book has a chapter on attributes which is mostly useless to me because I do things by hand (that will be the subject of future article).

Attributes carry statically known information. For example say that you have the expression 4+5. Because 4 and 5 are int literals they have "int" as the value of their type attribute. Then when "+" is looked up the compiler finds a definition that takes two ints and returns an int. Thus the overall expression has "int" for its type attribute. You can even store bindings in a symbol table attribute rather than having an external symbol table. That seems like an extremely inefficient implementation but maybe I will end up using that in some way. This is also great for handling things like when something has been definitely constructed or destructed. If you have a rich (euthamism for "complex") type system that stuff is hard to track with symbol tables. You already have the structure in your parse tree anyway, so what not use it?

No comments: