26 Oct, 2009, Silenus wrote in the 1st comment:
Votes: 0
I am trying to a reasonable data structure for indexing labelled trees of arbitrary arity in memory for the mud driver I am working on. I am curious if anyone has had any experience building such a data structure and what options there might be available.

Thanks.
27 Oct, 2009, David Haley wrote in the 2nd comment:
Votes: 0
What is the problem you are solving here? I want to make sure we're speaking the same language before talking implementation.
27 Oct, 2009, Scandum wrote in the 3rd comment:
Votes: 0
Silenus said:
I am trying to a reasonable data structure for indexing labelled trees of arbitrary arity in memory for the mud driver I am working on. I am curious if anyone has had any experience building such a data structure and what options there might be available.

Probably best to look for yourself: http://en.wikipedia.org/wiki/Self-balanc...

The self balancing trees are probably what you want, but they're tricky to implement and there are few good examples. Perhaps David will write you one if you ask him nicely. :evil:
27 Oct, 2009, Silenus wrote in the 4th comment:
Votes: 0
This might seem a bit odd but for the purposes of keeping the type system generic and extensible I implemented it in terms of a n-ary tree with node labels which indicate the "type" of the node in question. For LPC since I cannot completely determine the types of values at compile time statically given the design of the type system I need to convert these structures into a single integer valued tag. The approach I am thinking of using is to construct a mapping from labelled n-arity trees to tags. I have figured another way to solve this problem which is LPC specific by hacking a case by case tag enumerator into the system. This works but isnt particularly general and since there is a good possibility I may experiment with different type extensions it wont work in general either.
27 Oct, 2009, Silenus wrote in the 5th comment:
Votes: 0
I thought about this a bit more and I may have a sort of crude solution to it but I am uncertain as to how good it is. The idea is similar to that used in a trie- the idea being on some level to mirror the search tree with a data structure which discriminates between sets of members during each decision made in the search. I am thinking currently as follows-

At each node construct a hash function which takes into account the arity of the node and the Cartesian product of type labels of the children and take the matching branch in this structure provided it exists (if it doesn't the structure in question doesnt exist yet in the data structure). At the subsequent node do the same thing until one hits the bottom or one is sure that the tree you are attempting to find does not exist in the data structure.

So I think I may have solved my own problem :-).
27 Oct, 2009, David Haley wrote in the 6th comment:
Votes: 0
I still am not sure what problem you're trying to solve with the data structure. Without a clear description of the exact data problem being solved (I don't really care that it's for generality and extensibility of your type system, that's not really relevant right now) it's difficult to talk about precise data structures. That said, why isn't a standard binary search tree appropriate? This business about hashing at every node sounds fishy. When you talk about "labeled" trees, what exactly do you mean? What is labeled? Are the tags the labels? Are the labels something else? Why can't you just store a tree of tags? If you have some thing you're trying to associate tags with, why not have a tree of <tag, <thing>> to begin with? Then again, it sounds like you're building a tree where each contained node is a whole tree of its own…?

To help understand the problem, I think you need to say what exactly you are doing. What are these tags? What are the nodes in your tree? What are the "labels" in your tree? You are mapping from what to what?
27 Oct, 2009, Silenus wrote in the 7th comment:
Votes: 0
Basically I wanted to have tree index which for a new tree with nodes like LABEL_N ( node_1, node_2, … node_n ) where LABEL denotes something like INT, STRING, FLOAT (with the arity being 0) or ARRAY ( INT() ) or MAP ( STRING() , INT() ) for "composite types". The idea here is to check whether a type already exists in the system and associate it with a label (this could simply be a pointer pointing to the root of the type tree in question in the index).

The type tag is similar to stuff they have in LPC or LISP or other systems where one cannot tell if a given value is a given type at compile time so the information needs to be carried around with the value at run time.

Basically the compilation system will have the responsibility of constructing the correct type tree denoting the type of the value in question then associating it with some sort of tag which summarizes the type then hand off this information to the runtime system.

A binary search tree- balanced or not will not work because the stuff I am trying to index are whole trees (where you cannot construct a total ordering) not just values from an ordered set. There are probably some subtleties I am missing. I.e. the system in question does not cope too well with type unification and "union types" so I might have to adjust the index some to cope.

I am thinking that perhaps constructing an index which follows the logic of matching two trees but instead match 1 tree against a set of trees simultaneously might work. i.e. check the 1) label of the root node and see if there is subset of trees with this root node. 2) in this set check arity of the root node and refine this subset. 3) check the label,arity of the first child refine and so on.

I am guessing that this structure would resemble a trie in some respects except that at each node the set of trees is being restricted instead of the set of strings.
27 Oct, 2009, David Haley wrote in the 8th comment:
Votes: 0
Why is a set of type descriptors not appropriate?
By "index" do you just mean container?

You speak of a "type tree" – is this an important feature of the type system, or merely an implementation detail? Is it important that you have a tree of each "type tree"?
Why can't these trees be ordered?

Basically I'm trying to step away from complex and potentially misleading implementation details, and figure out the concept of what you're doing…
27 Oct, 2009, Silenus wrote in the 9th comment:
Votes: 0
The problem is the types in the system are not strictly speaking a finite set (for LPC it is almost a finite set with the exception of structs, but for my extended experiments it will not be) so a fixed set of type descriptors will not work. It is in some sense an implementational detail but also in another sense just representational. A type in my case is kind of like a term in a language like Prolog but perhaps without (at this point) variables. For example ideally things like-

MAP( STRING(), MAP( STRING(), ARRAY(INT) ) )

should be possible. I dont think trees like this can be ordered (but feel free to show that I am wrong) and thus they might be a bit tricky to "index" using something like a binary search tree/red-black tree or what not.
27 Oct, 2009, David Haley wrote in the 10th comment:
Votes: 0
I didn't say a fixed set of type descriptors, nor a set enumerating all possibilities. :smile: I just said a set.

Anyhow, it's actually quite easy to order these trees. You have atomic types and composite types. Establish a total ordering among the atomic types and the composite types, e.g.,

string < int < float < array(x) < map(x,y)

So far so good. Now if you need to compare two composite types, e.g., two arrays array(t1) and array(t2), you simply recurse and compare t1 and t2. For maps, you would compare left to right.

Therefore, map(int, x) is always greather than map(string, y).

I could write out a full comparison function, but I think the above should suffice to show the idea.

By the way, I'm not sure you're using the term 'index' in a standard way; it seems that you just mean store the things, not construct an index into a larger table, or an index into an array, etc. That might be part of why I was having trouble understanding what you're trying to do.
27 Oct, 2009, Silenus wrote in the 11th comment:
Votes: 0
Thanks David. Ok interesting I wasn't sure about that. By index (perhaps I am not using it correctly) I meant a method for efficiently doing lookup(x) primarily and perhaps in this case also important is a fast insert(x).

My concern about using a binary search tree derivative is now that if I am trying to compute term unifications on it, it might be quite tricky (and this extension is something I will also be looking at). I.e. find all the trees which unify with x- this is part of the reason I decided to try to explore looking at the decision process and attempt to construct some sort of data structure which matches it. Unfortunately I still dont really understand on a certain level why binary search trees work or the balancing methods. I know them well enough to perhaps implement them myself but that really isn't saying a lot.
27 Oct, 2009, David Haley wrote in the 12th comment:
Votes: 0
I'm not sure what you mean by not knowing how binary trees work despite being able to implement them.
Deeply understanding the RB algorithm is, by the way, rather irrelevant if you have an implementation, because at the moment all you care about is that it will give you a balanced binary tree, thus keeping the tree efficient. (A non-balanced tree can in the worst case degrade to a linked list, basically).

I also think you're worrying far too much about the cost of iterating over the type list to find which types unify with a given type. You aren't going to have hundreds upon hundreds of types, after all, so you could just iterate over them linearly. The unification algorithm is far more costly than the scan, and while scanning you will be able to immediately throw out things that can't possibly be candidates (e.g., map(x,y) will never unify with a float).

The moral of the story is to not make things efficient that are negligible in your total cost.

Note that if you move all the cleverness of the unification algorithm to the decision process, then all you have done is kept the same amount of complexity but merely displaced it. However, you don't know how to do this, and I don't think it's terribly straightforward, and yet unifying types is a well-known problem. So, you will have not reduced complexity in terms of performance, but you will have increased complexity in terms of conceptual difficulty and maintenance.
27 Oct, 2009, Silenus wrote in the 13th comment:
Votes: 0
Thanks again. That is a good point.

I guess these days I am trying to understand what makes algorithms tick so to speak. I.e. if you have some new algorithm x how do you go about proving that it yields a correct or optimal solution (perhaps even in optimal time if possible) how to prove the complexity and how the person came up with the idea in the first place. Been slowly trying to go through the relevant material but I think it might take a fair bit of time for me to learn.
27 Oct, 2009, David Haley wrote in the 14th comment:
Votes: 0
Typically you need to prove two aspects of "correctness": soundness and completeness. That is, will the algorithm always give you an answer? And, when it does give you an answer, is it guaranteed to be correct? Answering these questions is sometimes extremely difficult, but in any case relies on having some kind of mathematical formalization of the problem.

Optimality of a solution is only relevant when there are several solutions to begin with. If the question is binary, then "optimality" is reduced to simply correctness. A classic proof of optimality (e.g., of a path) is Dijkstra's shortest-path algorithm for networks. A* is another common case study.

Showing optimal time can be extremely complicated, as evidenced by the NP completeness problem. I wouldn't bother looking into this much if I were you. I would worry about it being sufficiently efficient, not optimally efficient. (You only care about the former in practical terms, anyhow.)

There is another way of "proving" correctness. Does it work on the cases you care about? Well, there you go. Write a whole bunch of unit tests and make sure that things work. If you find a bug later, fix it and write a new test.

Obviously this is not actually proving correctness, but it can make the difference between (a) sitting around wringing one's hands at how to prove correctness and as a result not actually doing anything, and (b) actually making forward progress in your project.
27 Oct, 2009, Silenus wrote in the 15th comment:
Votes: 0
Thanks David. I guess I dragged us off topic a bit- are there good books for this sort of thing? The ones I have are Cormen and some book titled "algorithm design". I looked at some PhD qualifiers in this area and my skills are nowhere close to good enough to pass :P. Most likely I will try an MS© somewhere :D.
27 Oct, 2009, David Haley wrote in the 16th comment:
Votes: 0
The only book I have is CLRS (is that what you mean by Cormen? It's usually called CLR (1st ed) or CLRS (2nd ed)). And then course materials, like handouts etc. Try e.g. CS 161: Design and Analysis of Algorithm....

You don't really need to get to PhD quals levels here unless you're interested in doing very serious algorithms work; what you might need for comps (breadth exams) should be more than sufficient already.
27 Oct, 2009, Silenus wrote in the 17th comment:
Votes: 0
Thanks David. I will check out the link.
0.0/17