Note: my definition of "boxing" has nothing to do with the term used in languages with a primative/object dichotomy. I use it to refer to treating a tuple as a non-tuple.
I knew that type systems were hard. I knew that name lookup can be hard. My language does not have many special cases. It has simple syntax and semantics. I thought that I had made choices that would make implementation easier. I had, for the most part. However complexity, like code, is a gas. Rather than consuming all memory it consumes all brainpower. It seems like every project that I attempt grows beyond my abilities. I am still doing well on my game though. I work on it maybe 10 hours a month but that time is generally very productive.
So here is my problem: I had it so that a declaration was visible to an expression if it was not after the expression. I had it so that a definition is usable by an expression if the definition was before the expression. So given the following expression, where A-D are arbitrary subexpressions:
A;
B,C;
D
Then D can use anything defined in A, B, and C. A cannot use anything defined in B, C, or D. B can use variables declared in A and can see variables defined in C. C can do the same with respect to B. So in expression B you can declare a pointer to a variable defined in C, but you cannot use its value. This is a little like using forward declarations in C++, except that you are not allowed to separated declaration/definition.
Ok so to implement this the compiler constructs a directed graph and there are operations to determine if a node is before or after or neither before or after another node. No problem. However there is now a problem. The boxing/unboxing of variables turns this from a less-than greater-than problem into a pathfinding problem. I am writing a non-optimizing compiler for a simple (hmm.... am I just deluded about this?) language. I do not think that name lookup should involve pathfinding, searching up a tree, sure. However I made my language and now I have to implement it. I wish that I had made a bed instead.
This gets my thinking about attribute flow. I have this great book, called Programming Language Pragmatics. I am sure that the high-faluting language theorists and the guys who write optimizing compilers will both look down on it, but it is a great book language design newbies wanting to make a language. So anyway, this book has a chapter on attributes which is mostly useless to me because I do things by hand (that will be the subject of future article).
Attributes carry statically known information. For example say that you have the expression 4+5. Because 4 and 5 are int literals they have "int" as the value of their type attribute. Then when "+" is looked up the compiler finds a definition that takes two ints and returns an int. Thus the overall expression has "int" for its type attribute. You can even store bindings in a symbol table attribute rather than having an external symbol table. That seems like an extremely inefficient implementation but maybe I will end up using that in some way. This is also great for handling things like when something has been definitely constructed or destructed. If you have a rich (euthamism for "complex") type system that stuff is hard to track with symbol tables. You already have the structure in your parse tree anyway, so what not use it?
Friday, August 31, 2007
Saturday, August 18, 2007
More on the hybrid type system
See my previous post about how I want both denotional and structural typing.
I had decided to take a year off from implementation to avoid burnout. That was on Father's day which was not too long ago. However I might be looking for a job soon so I want to review my code so that I can present it to potential employees. So while I'm at it I might as well rip out the old class system and throw in the new typedef system. That of course got me thinking, if I can typedef class literals to make named classes, shouldn't I also be able to typedef other type expressions? For example shouldn't I be able to typedef pointer types?
typedef A@ AP //AP is a named type with the structure A@
ok so now I can use it as an argument to or return value from a function:
(AP arg)->(AP ret) f { ret=arg }
However how to I get at the pointer value? It isn't named. I could add a "this" keyword, so ap.this would return an A@ but that seems to be the first step down the path to special case land.
Perhaps for now typedef will be limited to class types. Thoughts?
I had decided to take a year off from implementation to avoid burnout. That was on Father's day which was not too long ago. However I might be looking for a job soon so I want to review my code so that I can present it to potential employees. So while I'm at it I might as well rip out the old class system and throw in the new typedef system. That of course got me thinking, if I can typedef class literals to make named classes, shouldn't I also be able to typedef other type expressions? For example shouldn't I be able to typedef pointer types?
typedef A@ AP //AP is a named type with the structure A@
ok so now I can use it as an argument to or return value from a function:
(AP arg)->(AP ret) f { ret=arg }
However how to I get at the pointer value? It isn't named. I could add a "this" keyword, so ap.this would return an A@ but that seems to be the first step down the path to special case land.
Perhaps for now typedef will be limited to class types. Thoughts?
Saturday, August 11, 2007
I want structural and denotional typing
Originally Cahalang classes were rather simple records. You know, a bunch of named members wrapped up into a named type. Two types with the same members would have nothing in common unless descended from a common superclass. I also had the idea of having union and intersection types. For example you could define an ElfWizard type that was the intersection of Elf and Wizard. You could also do it without naming the resulting class, as in having an Elf&&Wizard variable.
Then something happened. You see, I was thinking about functions and had this idea that the arguments to a function defined a class and that the return values defined another class. That somehow the type of the function was more than just a tuple type. That maybe I could unify functions and constructors in an elegant way. Turned out that I couldn't make it work and keep with my design principles so I abandoned the idea. However it lead to something new.
First a bit of background. In Cahalan , is the tuple operator. So 5,6 is a value of type int,int. Nothing too special, many languages have tuples. However they don't interact well with templates. Consider the C++ list template class. What if I want a list of int,int? What then? I can't pass it int,int because then it is two parameters. I needed a way to box up tuples into single entities. Thus the boxing syntax was born. Thus [5,6] is a single value, not a tuple of values, and [int,int] is a single type, not a tuple of types.
Now here is where my constructor idea lead:
[int x, int y] a;
a.x=5;
a.y=6;
Boxing up variable declarations would create a class literal. Very cool, very structural.
What about denotional typing? I still want that! So now I am thinking that the following will hold true: (the << operator is the subset/subtype operator)
[int,int] << [int x, int y]
[int,int] << [int a, int b]
[int x, int y] << Point //typedef Point [int x, int y]
[int a, int b] << Complex //typedef Vector [int a, int b]
[int x, int y] << Vector //typedef Vector [int x, int y]
So now what about the intersection and union operators? I'm thinking that they will only work on named types. That way if you do an intersection type of two types that each have an x, you will be able to refer to them as A.x and B.x or whatever. Not that you should do that of course. However banning intersections of types that share member names just sounds cruel and is likely to lead to people naming their members A_x and B_x or something stupid. Note: intersection types are like using virtual inheritance in C++. Multiple inheritance got a bad name mainly to the default being non-virtual inheritance, oh and the fact that inheritance is overused in general.
Another development is that the unboxing operator would alleviate the hardship of not having a this pointer:
a'; //all of a's members are now locals; you don't need to write a.x or whatever.
Then something happened. You see, I was thinking about functions and had this idea that the arguments to a function defined a class and that the return values defined another class. That somehow the type of the function was more than just a tuple type. That maybe I could unify functions and constructors in an elegant way. Turned out that I couldn't make it work and keep with my design principles so I abandoned the idea. However it lead to something new.
First a bit of background. In Cahalan , is the tuple operator. So 5,6 is a value of type int,int. Nothing too special, many languages have tuples. However they don't interact well with templates. Consider the C++ list template class. What if I want a list of int,int? What then? I can't pass it int,int because then it is two parameters. I needed a way to box up tuples into single entities. Thus the boxing syntax was born. Thus [5,6] is a single value, not a tuple of values, and [int,int] is a single type, not a tuple of types.
Now here is where my constructor idea lead:
[int x, int y] a;
a.x=5;
a.y=6;
Boxing up variable declarations would create a class literal. Very cool, very structural.
What about denotional typing? I still want that! So now I am thinking that the following will hold true: (the << operator is the subset/subtype operator)
[int,int] << [int x, int y]
[int,int] << [int a, int b]
[int x, int y] << Point //typedef Point [int x, int y]
[int a, int b] << Complex //typedef Vector [int a, int b]
[int x, int y] << Vector //typedef Vector [int x, int y]
So now what about the intersection and union operators? I'm thinking that they will only work on named types. That way if you do an intersection type of two types that each have an x, you will be able to refer to them as A.x and B.x or whatever. Not that you should do that of course. However banning intersections of types that share member names just sounds cruel and is likely to lead to people naming their members A_x and B_x or something stupid. Note: intersection types are like using virtual inheritance in C++. Multiple inheritance got a bad name mainly to the default being non-virtual inheritance, oh and the fact that inheritance is overused in general.
Another development is that the unboxing operator would alleviate the hardship of not having a this pointer:
a'; //all of a's members are now locals; you don't need to write a.x or whatever.
So what is this place about?
Hi, my name is Tom. I am working on two programs right now. The first is a compiler for my language, Cahalang (I could not thing of a better name, and hey it seems to work for Erlang). Cahalan is intended for people who like C++. The second is a turn-based strategy game combining elements of Romance of the Three Kingdoms II and Warsong and some twists of my own.
Subscribe to:
Posts (Atom)