Saturday, September 1, 2007

Y Combinator

There is so much information on the internet that I cannot possibly study everything worth knowing. Everytime I read an article I take a risk. I am spending five minutes of my precious time in an attempt to get some sort of benefit. Thus when it comes time to pick and choose what I read I generally avoid things that are poorly named. For example "lambda". Any language where you actually have to type out the word lambda has a serious problem. The word lambda has no English meaning. The reason that the lambda calculus uses the lambda symbol is because in math people like to use single-character identifiers. I'm sure that someone could either come up with a better name for clarity or a better symbol for convience (Haskell uses a slash).

Thus when I see something along the lines of "Y Combinator in language X!" I know that I can safely ignore it. Anything named Y Combinator can't be useful. Except that it is. Of course the article named "Y Combinator in language X!" never meantions what the Y Combinator is or why I would want it but this tutorial does. I don't even know C# but I can see how this would be useful.

In object oriented languages the "this" or "self" keyword is used to refer to a method's object. Cahalang will have something similar for functions. It will be easy to implement because that address is already stored at [edx+4] when the function is called. Now I just need a name for it. The top two contenders are "this" and "$$". The argument for "this" is that it is familiar and works well. The argument against is that people are going to want to use it as a name for the first parameter of functions to simulate object oriented methods. The argument for $$ is that it works well with a feature that will be in 2.0: parameter names of implicit parameters for functions. So rather than writing:

(int x)->(int y){ y=x*x }

you could write:

{$1(int)*$1(int)}

or maybe:

{$1(int)*$1}

With $$ as the function name a factorial function would be:

{ if $1(int) {$$($1-1)} else {0} }

Hmm... not really beautiful code. Maybe with the types specified would look better here:

(int n)->(int f){if n {$$(n-1)} else {0} }

or if I made the if/else syntax into a simple tuple:

(int n)->(int f){if n, $$(n-1), 0}

Still not great but this isn't a functional language so I'm ok with it for now I guess.

Actually I think that I know the problem here. The problem is that the syntax for implicit parameters is too long. Rather than saving space it making things longer. What if the square function could be written like so: {$1*$1} But then how do we know that it should only takes an int without some fancy algorithm? Well since we are probably using it soon, probably passing it to a function, we have some context. I think that this parrallels how I am going to handle boxed tuples vs class literals, vs denoted classes. Maybe functions with malformed types get their type from the first time that they are assigned to a variable or called with arguments. I believe that this is similar to the var keyword in C#, where variables get their type from their initializer. I think that this is workable. Good thing that my language is simple.

So here is the plan:

1.0: type inference only for the return values of functions that do not have type specified.

2.0: type inference for implicit parameters, variable declarations, etc...

3.0: even more type inference.

I definitely do not intend to go all the way, because explict types are good for providing context in certain situations. However I do see the benefit of type inference. I use it in C++.

Here is the signature of on of my functions in my game:

template
void MovePoint(Point& point, World* world, Direction direction, EvalFunc is_good)

EvalFunc is always passed a function that takes a Tile* and returns a bool. I just figured that templating it would be easier than figuring out what the C++ function pointer syntax was. In my language it would be:

(Tile@)->(bool)@

of course you would probably be passing it by value, so you would drop the final @.

Friday, August 31, 2007

Handling Parallel Declaration with Boxing/Unboxing

Note: my definition of "boxing" has nothing to do with the term used in languages with a primative/object dichotomy. I use it to refer to treating a tuple as a non-tuple.

I knew that type systems were hard. I knew that name lookup can be hard. My language does not have many special cases. It has simple syntax and semantics. I thought that I had made choices that would make implementation easier. I had, for the most part. However complexity, like code, is a gas. Rather than consuming all memory it consumes all brainpower. It seems like every project that I attempt grows beyond my abilities. I am still doing well on my game though. I work on it maybe 10 hours a month but that time is generally very productive.

So here is my problem: I had it so that a declaration was visible to an expression if it was not after the expression. I had it so that a definition is usable by an expression if the definition was before the expression. So given the following expression, where A-D are arbitrary subexpressions:

A;
B,C;
D

Then D can use anything defined in A, B, and C. A cannot use anything defined in B, C, or D. B can use variables declared in A and can see variables defined in C. C can do the same with respect to B. So in expression B you can declare a pointer to a variable defined in C, but you cannot use its value. This is a little like using forward declarations in C++, except that you are not allowed to separated declaration/definition.

Ok so to implement this the compiler constructs a directed graph and there are operations to determine if a node is before or after or neither before or after another node. No problem. However there is now a problem. The boxing/unboxing of variables turns this from a less-than greater-than problem into a pathfinding problem. I am writing a non-optimizing compiler for a simple (hmm.... am I just deluded about this?) language. I do not think that name lookup should involve pathfinding, searching up a tree, sure. However I made my language and now I have to implement it. I wish that I had made a bed instead.

This gets my thinking about attribute flow. I have this great book, called Programming Language Pragmatics. I am sure that the high-faluting language theorists and the guys who write optimizing compilers will both look down on it, but it is a great book language design newbies wanting to make a language. So anyway, this book has a chapter on attributes which is mostly useless to me because I do things by hand (that will be the subject of future article).

Attributes carry statically known information. For example say that you have the expression 4+5. Because 4 and 5 are int literals they have "int" as the value of their type attribute. Then when "+" is looked up the compiler finds a definition that takes two ints and returns an int. Thus the overall expression has "int" for its type attribute. You can even store bindings in a symbol table attribute rather than having an external symbol table. That seems like an extremely inefficient implementation but maybe I will end up using that in some way. This is also great for handling things like when something has been definitely constructed or destructed. If you have a rich (euthamism for "complex") type system that stuff is hard to track with symbol tables. You already have the structure in your parse tree anyway, so what not use it?

Saturday, August 18, 2007

More on the hybrid type system

See my previous post about how I want both denotional and structural typing.

I had decided to take a year off from implementation to avoid burnout. That was on Father's day which was not too long ago. However I might be looking for a job soon so I want to review my code so that I can present it to potential employees. So while I'm at it I might as well rip out the old class system and throw in the new typedef system. That of course got me thinking, if I can typedef class literals to make named classes, shouldn't I also be able to typedef other type expressions? For example shouldn't I be able to typedef pointer types?

typedef A@ AP //AP is a named type with the structure A@

ok so now I can use it as an argument to or return value from a function:

(AP arg)->(AP ret) f { ret=arg }

However how to I get at the pointer value? It isn't named. I could add a "this" keyword, so ap.this would return an A@ but that seems to be the first step down the path to special case land.

Perhaps for now typedef will be limited to class types. Thoughts?

Saturday, August 11, 2007

I want structural and denotional typing

Originally Cahalang classes were rather simple records. You know, a bunch of named members wrapped up into a named type. Two types with the same members would have nothing in common unless descended from a common superclass. I also had the idea of having union and intersection types. For example you could define an ElfWizard type that was the intersection of Elf and Wizard. You could also do it without naming the resulting class, as in having an Elf&&Wizard variable.

Then something happened. You see, I was thinking about functions and had this idea that the arguments to a function defined a class and that the return values defined another class. That somehow the type of the function was more than just a tuple type. That maybe I could unify functions and constructors in an elegant way. Turned out that I couldn't make it work and keep with my design principles so I abandoned the idea. However it lead to something new.

First a bit of background. In Cahalan , is the tuple operator. So 5,6 is a value of type int,int. Nothing too special, many languages have tuples. However they don't interact well with templates. Consider the C++ list template class. What if I want a list of int,int? What then? I can't pass it int,int because then it is two parameters. I needed a way to box up tuples into single entities. Thus the boxing syntax was born. Thus [5,6] is a single value, not a tuple of values, and [int,int] is a single type, not a tuple of types.

Now here is where my constructor idea lead:

[int x, int y] a;
a.x=5;
a.y=6;

Boxing up variable declarations would create a class literal. Very cool, very structural.

What about denotional typing? I still want that! So now I am thinking that the following will hold true: (the << operator is the subset/subtype operator)

[int,int] << [int x, int y]
[int,int] << [int a, int b]
[int x, int y] << Point //typedef Point [int x, int y]
[int a, int b] << Complex //typedef Vector [int a, int b]
[int x, int y] << Vector //typedef Vector [int x, int y]

So now what about the intersection and union operators? I'm thinking that they will only work on named types. That way if you do an intersection type of two types that each have an x, you will be able to refer to them as A.x and B.x or whatever. Not that you should do that of course. However banning intersections of types that share member names just sounds cruel and is likely to lead to people naming their members A_x and B_x or something stupid. Note: intersection types are like using virtual inheritance in C++. Multiple inheritance got a bad name mainly to the default being non-virtual inheritance, oh and the fact that inheritance is overused in general.

Another development is that the unboxing operator would alleviate the hardship of not having a this pointer:

a'; //all of a's members are now locals; you don't need to write a.x or whatever.

So what is this place about?

Hi, my name is Tom. I am working on two programs right now. The first is a compiler for my language, Cahalang (I could not thing of a better name, and hey it seems to work for Erlang). Cahalan is intended for people who like C++. The second is a turn-based strategy game combining elements of Romance of the Three Kingdoms II and Warsong and some twists of my own.