09 Dec, 2007, Hades_Kane wrote in the 1st comment:
Votes: 0
Something I've been considering for a while is trying to come up with a code based solution to people coming in with obviously inappropriate names, most notably historical figures and dictionary words.

What I had been considering is trying to find a text based file that is perhaps the source of a spell checker or a dictionary, and then writing incorporating that into the check against illegal names when someone goes to create a new character.

I'm curious if anyone else had already attempted this and has any tips or would be willing to share what they've done.
09 Dec, 2007, Guest wrote in the 2nd comment:
Votes: 0
Funny you should mention that since I've been plowing through the battlefield of bugs I've created for myself and the name parsing was one of the problem areas I've been working on this morning. Anyway. We added this check to the check_parse_name function in Smaug:

/*
* This grep idea was borrowed from SunderMud.
* * Reserved names list was getting much too large to load into memory.
* * Placed last so as to avoid problems from any of the previous conditions causing a problem in shell.
*/
char buf[MSL];
snprintf( buf, MSL, "grep -i -x %s ../system/reserved.lst > /dev/null", name );

if( system( buf ) == 0 && newchar )
{
buf[0] = '\0';
return false;
}


The reserved.lst file was seeded with a dictionary file I found somewhere. It uses that as a seed, and if we reject any other names in game, they get appended onto that file. There's probably a better way to go about it but for now this works well.
10 Dec, 2007, Hades_Kane wrote in the 3rd comment:
Votes: 0
Fantastic, thanks Samson!

Here is a link to a word list I found today, listing 135,069 words.
http://vburton.ncsa.uiuc.edu/wordlist.tx...

If I find any larger ones, I'll post them in case anyone else is interested in doing this.
10 Dec, 2007, David Haley wrote in the 4th comment:
Votes: 0
You could store it in memory compressed (with actual compression or with a lexical trie or something). Accessing that would be faster (and more portable) than talk to grep. Also, you could search more intelligently (i.e. much faster) than having to look at every single line of the file.

I'll run that wordlist through my trie code later tonight and see how big it is once it's loaded in memory.
10 Dec, 2007, David Haley wrote in the 5th comment:
Votes: 0
Hmph. The overhead paid exceeds the gain of prefix compression. To save memory you'd have to do something smarter than a naive trie implementation. In retrospect I suppose this is unsurprising since the trie's main advantage is lookup speed, not memory savings (you'd need a *lot* of words with rather long common prefixes before the overhead became worth it). It would be possible to use a less naive trie implementation that wasn't memory-happy but I'm not sure it's really worth it.

Still, I would rather load up a big file into memory and search that than be talking to the shell…
13 Dec, 2007, ralgith wrote in the 6th comment:
Votes: 0
Well, if you had the entire Webster's unabridged dictionary (pushing what? 600k words now?) it would be worth it.
13 Dec, 2007, David Haley wrote in the 7th comment:
Votes: 0
Do you mean it'd be worth it to grep over the file and not load it into memory? 600k words is only 6000k if you assume 10 bytes per word (9 letters + 1 newline). 6mb really isn't a lot of memory. Sure, you'd have to add a bit of overhead depending on how you store it and all that. But unless you are in a very memory constrained environment, 6mb isn't a lot.

But in the end of the day, what matters most for how you store it is how often you need to access it. As usual, if you need faster lookup time you will use more space; if you don't care about lookup time you can store it as a single block of memory. And if you really care about space and not about time, you can store it compressed and decompress on the fly as you search.

The main problem I have with grepping is that it is tying your program to external tools that are very difficult to ship with your program. If the program assumes that grep exists and is on the path, then nobody can run your program unless they are in an environment with grep available – and a grep with more or less the same options, too.
13 Dec, 2007, Tommi wrote in the 8th comment:
Votes: 0
You could always use one of the GPL dictionaries that come with linux and the libraries used to utilize them. This way a simple if defined can exclude this on systems that do not have those installed and still give you the portability that some may want.
14 Dec, 2007, Hades_Kane wrote in the 9th comment:
Votes: 0
As long as I'm not opening myself up to some sort of unreasonable security risk, and since I anticipate always having grep available, and since it seems to work just fine, I'm happy. I have pretty generous server stats, and I have yet to even remotely scratch the surface of my available resources, I'm not overly concerned with memory or space, and if that ever is a concern, an additional $10 a month isn't going to hurt a thing.

Of course, if someone offered up another solution that seemed to be obviously worth it, I'd be all for it :p

But if memory, space, and security isn't a concern, then I'm cool :)

If someone did come up with something better, it might would be a worth a snippet release, as I imagine a lot of people would be down for that. The only modification I've considered was trying to implement it so that if a name is 7 letters or more, it would check for the word list for all words that are 4 letters or more, and make sure that the name doesn't -contain- any of those words. So that way names like SmellyPants would be caught by the filter.

But overall, I don't know if that would really be worth it.
14 Dec, 2007, David Haley wrote in the 10th comment:
Votes: 0
I guess we have different ideas of what's obviously worth it, since moving away from talking to an external tool with such an easy solution is obviously worth it to me. :tongue: (especially when memory isn't a problem for you)
15 Dec, 2007, Hades_Kane wrote in the 11th comment:
Votes: 0
Hmm, for some reason it stopped working, and I can't figure out why.

So, I took out check_parse_name and made it its own c file and incorporated all of the words from the word list into it, then it started crashing with a memcpy() error in gdb, so I'm assuming the list was too large to be loaded into memory, so maybe memory IS a concern afterall, just in a different way than I anticipated.

So, I might be back to square one on this…
15 Dec, 2007, Davion wrote in the 12th comment:
Votes: 0
Hades_Kane said:
So, I took out check_parse_name and made it its own c file and incorporated all of the words from the word list into it, then it started crashing with a memcpy() error in gdb, so I'm assuming the list was too large to be loaded into memory, so maybe memory IS a concern afterall, just in a different way than I anticipated.


I think that's an unlikely situation. Think about how your descriptions for rooms are loaded. They probably contain anywhere from 50-150 words. How large is this word list anyways? Maybe post the code and we can help debug it for ya.
15 Dec, 2007, kiasyn wrote in the 13th comment:
Votes: 0
Hades_Kane said:
Hmm, for some reason it stopped working, and I can't figure out why.

So, I took out check_parse_name and made it its own c file and incorporated all of the words from the word list into it, then it started crashing with a memcpy() error in gdb, so I'm assuming the list was too large to be loaded into memory, so maybe memory IS a concern afterall, just in a different way than I anticipated.

So, I might be back to square one on this…


in wordlist is in the cfile?
16 Dec, 2007, Hades_Kane wrote in the 14th comment:
Votes: 0
Silly me… it was crashing because I forgot to define 'char names' big enough to support that many characters…

After getting a character count and using that (w/ some room for addition) instead of [MSL * #] it's working in my check_names_parse function.

Thanks for the help, though :)
16 Dec, 2007, Guest wrote in the 15th comment:
Votes: 0
Hades_Kane said:
Hmm, for some reason it stopped working, and I can't figure out why.

So, I took out check_parse_name and made it its own c file and incorporated all of the words from the word list into it, then it started crashing with a memcpy() error in gdb, so I'm assuming the list was too large to be loaded into memory, so maybe memory IS a concern afterall, just in a different way than I anticipated.

So, I might be back to square one on this…


When you say you made it its own C file, do you still mean as part of the codebase or were you testing it standalone?

In ROM, there is a MAX_STRING type of setting that might affect being able to load it all into game memory, but no such limitation would exist if it's being done as a standalone C file compiled on its own.
16 Dec, 2007, Hades_Kane wrote in the 16th comment:
Votes: 0
Samson said:
In ROM, there is a MAX_STRING type of setting that might affect being able to load it all into game memory…


Yeah, that's what it was, but after I realized that and fixed it, it's working now. Thanks anyway though :)
0.0/16