17 Feb, 2009, Silenus wrote in the 1st comment:
Hi,

I am trying to type check a system (LPC) where operations tend to be heavily overloaded which means implementing a pretty hefty stack of type checking rules depending on the operator. Though in traditional LPC this is not too difficult (given the lack of deeply nested types)- it has become relatively more challenging since I am planning to at least try to implemented a more sophisticated type system. The question is how and what is the best way to implement the type rules so that they are general enough to capture most of the semantics needed in order that one can add new type rules which reflect a possible increasing sophistication?

For example the + operator in LPC is basically overloaded in 5 ways with implicit type promotions in certain cases. i.e. string + int|float will convert the int or float into a string for processing. The output for this operation for simple types is quite natural and probably the same for almost all binary operators (if you have so special mechanism indicating the conversions you need)- i.e. the return type is equal to the types on the right hand side.

However with arrays and mappings this gets a bit tricky. With +(array,array) you have to compute a compatible type for the elements which is compatible with both elements on the right hand side if the elements are different for the return type.

Is there a general algebra or calculus which can be used to compute these sort of types? The closest analog I could come up with was some sort of Prolog like rule system which does general matching and unification to compute the new type terms. This however (compared to the trivial essentially non nested case) seems quite a bit more complex to implement and I was wondering if there was a simpler solution.

17 Feb, 2009, David Haley wrote in the 2nd comment:
Type unification is "hard"; yes, the non-nested case is pretty simple (even you said it was trivial) to deal with, so don't be surprised that you are seeing solutions more complex than just that.

Here is some interesting reading on ML's type inference, and here's some more on type checking in general. (This link might not be stable after the quarter ends, although this one will get you there eventually.)

I would also draw your attention to the subtleties of array covariance and contravariance when it comes to type safety if you have subtypes.
17 Feb, 2009, Silenus wrote in the 3rd comment:
Thanks David. Unification isnt that "hard" I have written a Prolog like interpreter before which did something similar but I was hoping to avoid having to reimplement something like that for my type checker. There are a number of reasons for this the most important of which is that this type checker would have to sometimes operate at runtime because of the nature of function calls to other objects in the system (where the type signature of the call cannot be determined until the call is being executed at runtime so essentially the static type checker cannot say anything about the type). I suspect this would pose problems for a HM style type checker as well (has there been any recent developments here?) such as is used in ML.

The course link you have there is filled with useful tidbits but I dont see much on type theory at all. Do you know of any good books on this subject which perhaps emphasize the mathematical side without being math books focused on logic exclusively.

As for the subtyping issue and covariance/contravariance I am a bit uncertain here. LPC objects are untyped but there is some limited subtyping through the mixed mechanism for other entities. I suspect this issue which I have seen discussed in a different context (w.r.t OO languages) but it may not be applicable here. Or is there some sort of covariance contravariance concept which extends outside the OO arena?
17 Feb, 2009, Tyche wrote in the 4th comment:
Type inference rules are defined by the language. The converse is true.
17 Feb, 2009, David Haley wrote in the 5th comment:
I didn't mean that it's hard in the absolute scale of things, I just meant that it's not trivial at all, certainly compared to the very simple case of comparing two non-nested types.

You might be able to get away with checking operations on the elements individually, and not on the whole list. You won't find the type error until later, but that might be an acceptable compromise to make things easier.

Silenus said:
The course link you have there is filled with useful tidbits but I dont see much on type theory at all. Do you know of any good books on this subject which perhaps emphasize the mathematical side without being math books focused on logic exclusively.

Not really :smile: The only things I've read on type theory are very logic-heavy. It's sort of the nature of the beast, really. The link I gave focuses more on the algorithms involved in checking types. Type theory is a complicated subject taken as a whole (and this time I do mean in the absolute) so I don't think there are that many simple step-by-step guides.

That said, I'm not sure why you say the course link doesn't have much on type theory. I guess I'm not sure what you're looking for.

Silenus said:
As for the subtyping issue and covariance/contravariance I am a bit uncertain here.

Normally, if Sub <: Super, then:
`function f(Super x)  …endf( new Sub() )`

is valid. But this isn't true with arrays. Consider now that Sub <: Super and Sub2 <: Super.

`function f(Super[] x)  x[0] = new Sub2()endSub1[] my_array;f( my_array )`

Sub1[0] is no longer of type Sub1. Oops.
17 Feb, 2009, David Haley wrote in the 6th comment:
Tyche said:
Type inference rules are defined by the language. The converse is true.

A && !A :thinking:
17 Feb, 2009, Tyche wrote in the 7th comment:
DavidHaley said:
Tyche said:
Type inference rules are defined by the language. The converse is true.

A && !A :thinking:

A && !A => Aa
Thanks for the illustration.
17 Feb, 2009, David Haley wrote in the 8th comment:
Sorry for being too subtle. :wink: I was wondering which negation you forgot to put in, because as-is, you've said that something is true, and the converse is true too, which presumably isn't what you meant. Or maybe it is, in which case I'm not sure what you were saying.
17 Feb, 2009, Silenus wrote in the 9th comment:
David.

I am a bit unclear about the example you have given though I am not entirely sure. If you did just as you have done- would not a the assignment of sub2() in certain cases be disallowed unless you cast sub1[] to super[] (loss of information here) and if you do runtime type checking where you dont need to approximate like this would this problem (in this case) go away?

Well the logic heavy stuff is just very umm geared towards stuff like resolving issues in the foundation of mathematics- not so useful in my book (for this problem) i think. I guess I will keep on looking. I dont expect cookbook (step by step depending on what this means most books even in graduate mathematics are step by step in some manner though unless you have the training the steps may not always be clear).

I went through that link and looked at the text materials listed and listed reading and alot seems to pertain to programming languages in general but does not focus on type theory. I guess i will just keep looking. I have a couple specific textbooks on the subject but they are quite bulky and not quite as succinct as i would hope (like alot of CS materials).
17 Feb, 2009, David Haley wrote in the 10th comment:
Silenus said:
I am a bit unclear about the example you have given though I am not entirely sure. If you did just as you have done- would not a the assignment of sub2() in certain cases be disallowed unless you cast sub1[] to super[] (loss of information here) and if you do runtime type checking where you dont need to approximate like this would this problem (in this case) go away?

This assignment:
x[0] = new Sub2()
is "legal" because x is an array of type Super. Any subclass of Super can be used where an instance of Super is expected. For example,
Shape s = new Rectangle()
that is a perfectly normal thing to do, and you don't need any kind of casting.

The problem with arrays is that an array of subtypes cannot be used where an array of supertypes is expected, because in the snippet I gave, the function f stuck something into the array that violates the type constraint.

Runtime checking would solve this if your array weretagged with the appropriate lower type bound. Then the assignment would not be legal (at runtime) despite the compile-time assignment seeming legal.

My point here is just that there are subtle issues involving arrays and subtyping that you need to watch out for.

Silenus said:
Well the logic heavy stuff is just very umm geared towards stuff like resolving issues in the foundation of mathematics

Oh. I didn't think you were looking at mathematical type theory, just type theory for CS. Type theory is a "logic heavy" field almost by necessity. But you don't need to encompass the entire world of types, which is what some math type theories try to do: you only care about the "small" universe of types in your program.

Silenus said:
I went through that link and looked at the text materials listed and listed reading and alot seems to pertain to programming languages in general but does not focus on type theory. I guess i will just keep looking. I have a couple specific textbooks on the subject but they are quite bulky and not quite as succinct as i would hope (like alot of CS materials).

To be entirely honest, and no offense intended, I don't think it's realistic to expect a succinct reference on the subject. It's not an easy field, and has lots of subtleties. This is true of most CS materials, and is why most high-level stuff like this is not very succinct. When you start getting into theory, you need to be ready to get your hands dirty and not expect simple, succinct solutions. On the bright side, this means that you shouldn't be too concerned if your solution is complex, because it might necessarily have to be. :wink:
17 Feb, 2009, Silenus wrote in the 11th comment:
Quote
To be entirely honest, and no offense intended, I don't think it's realistic to expect a succinct reference on the subject. It's not an easy field, and has lots of subtleties. This is true of most CS materials, and is why most high-level stuff like this is not very succinct. When you start getting into theory, you need to be ready to get your hands dirty and not expect simple, succinct solutions. On the bright side, this means that you shouldn't be too concerned if your solution is complex, because it might necessarily have to be.

No offense taken :-). Well I meant comparatively speaking. You do find alot of succinct graduate level mathematical textbooks on various subjects which are extremely short and insightful though not comprehensive. An example might be something like an invitation to C* algebras which clocks in at 100 pages but is considered the best graduate level introduction to the subject. I find alot of CS books for whatever reason tend to be alot longer and I still dont quite understand why. Perhaps it is something to do with the nature of the discipline. Hopefully I am not offending anyone that isn't my intent.

In the case of LPC most likely I will have to resort to a mix of runtime and compile type checks. The problem with any increased sophistication is that the runtime checks could be quite expensive. I might just stick have to stick to something closer to what LPC compilers currently use to get adequate performance. Or perhaps I may need to be a bit inventive in terms of implementation.

Anyhow thanks for the input.
17 Feb, 2009, David Haley wrote in the 12th comment:
In my experience most math textbooks operate on very "clean" domains, whereas CS – at least CS that tries to model real-world applications – tends to be fairly "messy". As a result, there are sometimes more corner cases and details to look at. Also, CS textbooks will talk about algorithms and implementation details, whereas math textbooks tend to focus on definitions and properties. Again, just in my experience.

As for LPC: I don't fully understand the type system or type rules of LPC. Whatever you do needs to obey those type rules, or if you depart, you need to do so with full knowledge (and presumably you need to make that clear to consumers as well). It might be nice to impose some extra type discipline onto the language, but I don't know enough about typical LPC code to say either way. If you really do have cases of types completely unknown at compile-time, you probably will have to do something at runtime.

Again, note that you can do type-checking on the element level, not the container level. Java generics are an example of this, I think: the runtime type-checking is done when you use things, not when you pass containers around.
18 Feb, 2009, elanthis wrote in the 13th comment:
DavidHaley said:
The problem with arrays is that an array of subtypes cannot be used where an array of supertypes is expected, because in the snippet I gave, the function f stuck something into the array that violates the type constraint.

Only if the array parameter is volatile and only if arrays are passed by reference. Both are which are the expected behavior from the snippet, of course… but yeah, it gets hairy, quick.
18 Feb, 2009, quixadhal wrote in the 14th comment:
DavidHaley said:
As for LPC: I don't fully understand the type system or type rules of LPC. Whatever you do needs to obey those type rules, or if you depart, you need to do so with full knowledge (and presumably you need to make that clear to consumers as well). It might be nice to impose some extra type discipline onto the language, but I don't know enough about typical LPC code to say either way. If you really do have cases of types completely unknown at compile-time, you probably will have to do something at runtime.

Yes, LPC is ugly in that respect. The mixed type itself acts as a metatype, which can contain anything, and which is only really pinned down to a solid entity at runtime. Because LPC wasn't created with a proper language definition (it evolved, like Perl), and because many LPC programmers weren't CS students (or were still learning), mixed is used far more than it should be (IMHO) in most mudlibs.

If you're trying to create something that's LPC compatible, you will have to accept that some things won't be known until the bytecode actually runs and interrogates the type information (of the data) at run-time.
18 Feb, 2009, Cratylus wrote in the 15th comment:
mixed is used far more than it should be (IMHO) in most mudlibs.

>:|

-Crat
18 Feb, 2009, Tyche wrote in the 16th comment:
If you're trying to create something that's LPC compatible, you will have to accept that some things won't be known until the bytecode actually runs and interrogates the type information (of the data) at run-time.

Nooooooooo...
18 Feb, 2009, David Haley wrote in the 17th comment:
Mixed type isn't the same thing as no type information at all. You can still sometimes establish bounds on types, and do type checking based on that. I suspect that in practice, types are not mixed willy-nilly, and there is some order to the madness. The difference would be imposing this order at the language level, instead of hoping that users do it themselves.
18 Feb, 2009, Silenus wrote in the 18th comment:
Quix mentions one of the problems i.e. the mixed type. The second problem comes out of the call_other semantics which has the general form in lpc of call_other(object, "string", args). Since string can be computed as an arbitrary value at run time, you really strictly speaking cannot say much about the return type which can be anything including a situation where you are guaranteed to have to return an error (assignment to void). When I mean the type is unknown this is sort of what I mean. I guess in some sense it's you do know that it's something in the universe of all possible types in the language but beyond that I am not sure if you can say very much.

This is actually rather problematic since it makes call_other quite inefficient (since you cannot bind the function at compile time to an address). Of course there are two more restricted forms of call_other as well.

ob->foo(args) is equivalent to call_other(ob,"foo",args)

string->foo(args) is equivalent to call_other( find_object("string"),"foo",args) )

These yield a bit more compile time information but I am not too sure if they can be used for optimization purposes.

One other nasty thing that occured to me is that though efuns such as map/filter (which accept both arrays and mappings) are overloaded but bc LPC doesnt permit the user to selectively overload functions if you override these functions in LPC (using a "simulated efun") they would have to be declared as taking mixed arguments and returning type mixed which means that any downstream operations though they should be able to infer the type may not be able to do so unless you have interprocedural type inference. :/
18 Feb, 2009, David Haley wrote in the 19th comment:
In the call_other function, is this how to interpret the arguments:
object – the object whose method we are calling
string – the method name
args – a list containing arguments for the method

In this case, you have to do basically what Java does for its similar method calling functionality and use casts.

Of course, things get interesting if the receiver also uses unspecified types. But basically I view this as almost exactly the same thing as Java with 'Object' objects lying around and needing runtime checks in the form of casts.
18 Feb, 2009, Silenus wrote in the 20th comment: