MUD-Dev
mailing list archive
[ Other Periods
| Other mailing lists
| Search
]
Date:
[ Previous
| Next
]
Thread:
[ Previous
| Next
]
Index:
[ Author
| Date
| Thread
]
Re: [MUD-Dev] Naming and Directories?
On Wed, 17 Mar 1999, Jon A. Lambert wrote:
> On 16 Mar 99, Mik Clarke wrote:
> > Ummm. Yick. 26 4 byte pointers by, say 16 characters long just to index 100-200 names?
> > Even with a sensible implementation you're going to end up wasting 90% of the storage you
> > allocate (and a ten letter name means you have to allocate 1K pointers, each different
> > name then requires another 900 bytes or so).
>
> But you have changed your requirements above from speed to memory.
> No fair. ;)
>
> The alphabet trie above has some pros and cons:
>
> 1) avoids the overhead of string hashing
> 2) provides intrinsic support for abbreviations
>
> 1) based on English and may not support blanks, puntuation, or other
> characters (of course this can be altered or enforced)
> 2) distribution of names may tend to cluster in certain nodes
>
Some additional pros and cons on a search trie (for large problems) are:
pro : distribution of names may tend to cluster in certain nodes
its a pro, because storing names with the same prefix does not make the
trie grow linearly with name length, so storing "green bottle" and
"green dragon" would require the same space for "green ", only "bottle"
and "dragon" would be distinct.
pro : for the large _static_ case, you can easily store the trie on disk,
using file pointers instead of memory pointers. I did this for lookups
in a large genetic database (Genbank, size ~11 Gb) with much success,
the trie (of ~3 million keywords) taking up 82Mb. Note that since
the namespace is well filled (or clustered), the memory usage is about
30 bytes pr. word, since the trie 'reuses' the common prefixes (words
are approx 6-10 characters long). (the trie is a bit compressed, so
lookup actually takes O(m * sigma), m being the keyword length, with
O(m) disk reads.)
con : it _also_ takes time to insert/delete entries in the trie, so
you may get another drag on load, as you ease lookup time. You really
should only use a trie (compared to linear search), if
#lookups*#objects > #movements*sizeof(names) for the locality in
which you use the trie. Most usefull, will probably be to apply it to
global lookups only, since 'movement' here is limited to
creation / deletion.
Some optimizations i have heard about (but not trie-d):
memory saver: Instead of a full 26 letter alfabet, you could
cluster individual letters, say 'ab' 'cd' ... such that the number of
pointers is reduced, making a (small)hash or linear search in the end
(of a greatly reduced number of objects) to see which actually match.
As mentioned earlier (i forget who *blush*): compressing the trie (long
chains that does not branch) is a sure memory saver.
Another compression is possible (it has some girls name): if some (any)
subtree of the trie is fully 'filled', say f.inst. that the 4 letter
prefixes of your used names covers all combinations, then you can
compress this to a simple list, doing direct lookup, instead of 4
lookups.
Hans Henrik Stærfeldt |
email: bombman#diku,dk | voice: +45 40383492
hhs#cbs,dtu.dk | voice work: +45 45252425
phone-mail: | address:
40383492#sms,tdm.dk | Hans Henrik Stærfeldt,
WWW-home | Dybendalsvej 74 2. th,
http://www.cbs.dtu.dk/hhs/ | 2720 Vanløse, Danmark.
|
Student of Computer Science | Scientific programmer at Center for
and Information Psychology. | Biological Sequence Analysis,
at University of Copenhagen | Technical University of Denmark.
_______________________________________________
MUD-Dev maillist - MUD-Dev#kanga,nu
http://www.kanga.nu/lists/listinfo/mud-dev
- Thread context:
- Re: [MUD-Dev] Naming and Directories?, (continued)
[ Other Periods
| Other mailing lists
| Search
]