29 Jun, 2009, Silenus wrote in the 1st comment:
Votes: 0
Hi,

I am working on a LPC compiler at present for a mud server and I am curious if anyone had some suggestions for how best to resolve and determine the correct operator to apply for a set of binary operators which are overloaded based on the type of it's arguments. Two simple methods would be to generate a two level switch statement and or a function table however both methods suffer from obvious problems since in general type1 op type2 <—> type2 op type1 are often the same and select the same operator with one argument coerced in some manner and I would redundantly be reserving space or encoding both possibilities adding to code repetition. So the question I have are there better ways than using these rather straightforward methods?

An example of this is the '+' operator in LPC. You have the following variants-

int + int
int + float —> coerce int to float then use float + float

float + int —-> coerce int to float then use float + float
float + float

string + int —-> coerce int to string then use string + string
string + float —-> coerce float to string then use string + string
string + string

array + array
mapping + mapping

Is there a good way to encode this coercion behavior?

Thanks in advance.
29 Jun, 2009, Tyche wrote in the 2nd comment:
Votes: 0
Not knowing the target… is it a VM, x86, stack-based, register-based, some sort of interpreter?
It seems to me that at the point you have a complete expression, only then would you test each operator for it's type and emit the appropriate instructions.
That implies some if or switch statement, because these are not really overloads.
I have yet to see a compiler that doesn't include a lengthy amount of repetitive code.
29 Jun, 2009, Silenus wrote in the 3rd comment:
Votes: 0
Unfortunately given the nature of LPC instruction selection has to be differed at least in alot of cases to runtime. I am currently just working on the runtime case and then perhaps will later look at techniques to get rid of runtime instruction selection via type inference techniques and perhaps other tricks. Everything at present will be tagged with a type tag.

I suspect I could write a code generator for this and have it output the correct "stuff" given a more succinct description in a text file or something, however I have been wondering if you can do something similar to this using metaprogramming techniques (maybe at the price of ruining readability). My thoughts on this would be to do something like having a template <> static list which holds three constants per node consisting of the relevant type tags(arg1, arg2 and return type) and a block of code data which implements the proper instructions to emit for that case. I suspect it's possible but my C++ skills at present are only so so(I dont understand advanced tricks).

I know metaprogramming is pretty rare during the course of normal programming (except perhaps if you are writing libraries)- but does anyhow have an idea how something like this might be done?

(Actually I guess this could just be some data driven code generation which the compiler pieces together at runtime too but that seems a little wasteful)
29 Jun, 2009, David Haley wrote in the 4th comment:
Votes: 0
Basically you need a table of operations on the one hand and a table of coercions on the other. Then you would need to pick a coercion rule that suits you, for example, pick the operation that minimizes the number of coercions. In the examples you gave, it's actually pretty easy because nothing depends on the order. It would be harder if you had this:

int + string –> coerce string to int and then do int addition
string + int –> coerce int to string and then do concatenation

because now, if you have something that can be coerced to a string and an int on the left, and something that can be coerced to a string and an int on the right, it's unclear which one to use.

In C++ operator overloading is always determined by the thing on the left. Other languages let you put in right associative operators that will be in the thing on the right. This makes determining coercion pretty easy.

Anyhow there are two separate problems here: (1) determining the semantics, and (2) implementing them. I'm not sure that you have a full specification of the overloading/coercion semantics yet, but if you did, there are fairly straightforward ways to implement it with lookup tables. I wouldn't bother with metaprogramming but that's just me.

I agree with Tyche; this kind of stuff will always have repetitive code, and lots of code.
29 Jun, 2009, Silenus wrote in the 5th comment:
Votes: 0
Thanks. I suspect I will go with the generation at runtime approach for this. I suspect it will cut down on the amount of code I need to write significantly and be easier to implement that the metaprogramming approach or creating a separate code generator. Something like-

class Operator
{
addRule(….);

Function* emit();
};

Function* F = opPLUS.emit();

One thing I am still not sure about is whether inlining these functions would be any faster than having them as small helper functions. I am more inclined to try the simpler helper function approach for now. I can always change it relatively easily once I do some profiling.
30 Jun, 2009, Tyche wrote in the 6th comment:
Votes: 0
I created a language called Aphrodite which resolves types at runtime. It implemented a hand written recursive descent parser, which generated bytecode that was executed on a tiny virtual machine. Here are some scraps of the initial C++ code that might be instructive.

Compiler piece…
//————————————————————–
//
// BNF - <expression> ::= <term> [<addop> <term>]*
//
//————————————————————–
void Compiler::Expression ()
{
Term ();
while ((Look.type == T_SUB) || (Look.type == T_ADD)) {
switch (Look.type) {
case (T_SUB):
Match (T_SUB);
Term ();
code (OP_SUB);
break;
case (T_ADD):
Match (T_ADD);
Term ();
code (OP_ADD);
break;
}
}
}

So..

"Hello " + "world"

…because of lazy evaluation generates the form "term term operand"
which is encoded the following bytecode…

PUSHS Hello
PUSHS world
ADD

I could just have easily generated assembly and passed the result to an assembler, or even machine code directly from the compiler, or even executed the operations immediately…rather than generating some bytecode.

The VM wraps everything into a Variable class which is just an C++ wrapping of a union of types. That simplifies the VM routines a fair bit.

void VM::op_add (void)
{
Variable right(frame.pop());
Variable left(frame.pop());

frame.push( left + right );
}


The actual rules of addition are found in the Variable class..
Variable Variable::operator+( const Variable &right ) const
{
if (isNumeric() && right.isNumeric()) {
if (Type() == FLT || right.Type() == FLT) {
return Variable(tofloat() + right.tofloat());
} else {
return Variable((int)(tonum() + right.tonum()));
}
} else if (Type() != right.Type() && Type() != LIST)
throw aphrodite_exception(E_TYPE);
else if (Type() == STR && right.Type() == STR) {
String *x = new String(*value().str);
*x += *right.value().str;
return Variable(x);
} else if (Type() == LIST)
return Variable(value().list->setadd(right));
else
throw aphrodite_exception(E_TYPE);
return NULL;
}


The rules above aren't all that different than the ones you are using. They certainly don't make the code hard to follow anyway.

1) Operations involving floats have the operands converted to floats and the result is a float.
2) String operations must have a strings as both operands and the result is a string,
3) List operations must have a list as the left operand, the operand on the right is added to the list.

The code above was pretty much the code I had when the compiler/VM began to run correctly, not the latest code. My coding philosophy is essentially get it working, and then refactor and optimize later. I tend to favor small functions that do one thing and one thing only. That makes refactoring and optimizing later much easier.
30 Jun, 2009, Silenus wrote in the 7th comment:
Votes: 0
Thanks for all the input. I suspect I am closing in on how to do this correctly on LLVM(there are just some platform independent details that I am enquiring about). I am use a parser generator to do most of the heavy lifting for me atm. (I don't like the idea of manually messing around with lookahead sets and so on) so I didnt quite adopt the approach you are describing but what I am doing is quite similar. The semantic actions build an ast which i plan to pass over multiple times for semantic checking and code generation.
30 Jun, 2009, elanthis wrote in the 8th comment:
Votes: 0
I'd be very interested to see what you produce when this is up and running. I've been contemplating a LLVM-based MUD language since January, but the moving-to-Seattle-and-preparing-for-school thing has had my free time pretty bound up most of this year. (OK, that's a lie; drinking with friends I won't see again has had my time bound up most of this year.)
0.0/8