26 Aug, 2009, Barm wrote in the 1st comment:
Votes: 0
I've started working on my item system and have placed the first one, an apprentice dagger, on the floor. Now I need to pick it up. This has me mulling over the best way to convert what the player types after VERB to their intent. Clearing, I want something smarter than:

You see A Gorey Battleaxe
> get axe
Fool, there is no 'axe' here!
> get Battleaxe
Fool, there is no 'battleaxe' here!
> get a gorey battleaxe
Fool, your capitalization does not match!


OTOH, I'm not trying to pass the Turning test and I'm reluctant to add fields that represent extra work for designers unless there's a good reason for it.

My initial thoughts are something like this; for each object in the room:

Pass 1) test argument against the item name. If a full match, return item.
Pass 2) test argument against builder provided aliases (if any). If a full match, return item.
Pass 3) test argument against the item name again. If partial match, return item.

These don't really have to be three separate iterations, we could just do one with weighted guesses.


We kill the dragon and in his treasure vault we see:

A) Shield of Blade Turning
B) Dark Runed Claymore
C) Rune Etched Stilleto
D) Dagger of Slaying

> take shield of blade turning (full name match, no ambiguity)
> take dagger (C matches on type, D matches on type and partial name)
> take blade (B,C,D all match on type, A matches on partial name)
> take sword (B matches on type)
> take Rune (B matches on partial name, C matches better – partial name and word boundary)

Reading that over, I'm thinking that 'blade' is too vague to be a useful alias but 'sword' isn't. But what about the 'Naginata of the Last Word'? It's a kinda sword on a stick. 'Polearm' is a possible alias but I doubt someone would type it so that item probably wouldn't have an alias at all.

I've skipped specifying quantities, numeric index (get blade 3), or objects laying on objects (take book from table), but this is probably enough for the moment.

Time for more coffee.
26 Aug, 2009, David Haley wrote in the 2nd comment:
Votes: 0
I think that any object should have a displayed name (dark runed claymore) and then a list of keywords that, at a minimum, include everything listed in the displayed name. So, the dark runed claymore would have keywords "dark", "runed", "claymore". You could get fancy and have primary keywords and secondary keywords, so that keyword matching prefers primary keywords over secondary keywords, but I'm not sure that's necessary. I think that one easy way to solve this would be to ask the player for clarification upon encountering ambiguity. For example, if there is a dark dragon longsword and a dark runed claymore, and the player types "get dark", you would say:
- Did you mean "dark runed claymore" or "dark dragon longsword"?
This would let the player specify an exact match to resolve ambiguity.

I'm not a fan of stretching too far; for example, I'm not sure I like the idea of "get dagger" getting you the claymore because a dagger is of the same type as the claymore.

In some sense, it seems that what you want is an ontology of types, allowing people to specify very precise types (naginata), or types further up the ontological tree (polearm, weapon) or even siblings (e.g. dagger instead of claymore, both under the 'blade' parent, or something like that).
26 Aug, 2009, Sandi wrote in the 3rd comment:
Votes: 0
I agree with David on the keyword list. You may not want to make extra work for your designers, but you REALLY don't want to make extra work for your players. Let the game match the name, then the keywords, which would save repeating the name in the keyword field as is required by DIKU.
26 Aug, 2009, Barm wrote in the 4th comment:
Votes: 0
In my help file system I have lines like this:

name: guild
aliases: [ guilds, class, classes, roles, role, sect ]
text: *content of the help topic*


So typing 'help role' performs the same action as 'help guild'. I kinda like David's suggestion about adding the name to the keywords, maybe as a total string and then each individual word. If I did that automatically on startup I skip the overhead of text searching the item name AND checking keywords. Plus it's dead easy to merge it with keywords provided by item designers, if any. Oh, and it saves me the step of converting to lower case for compares as well.

Quote
In some sense, it seems that what you want is an ontology of types, allowing people to specify very precise types (naginata), or types further up the ontological tree (polearm, weapon) or even siblings (e.g. dagger instead of claymore, both under the 'blade' parent, or something like that).

Not really. I think I gave that impression because my post was an attempt to verbalize (in writing) my addled thoughts. I'd like an intuitive an friendly system, but if the player wants to ask for items by atomic weight, fudge 'em.

Quote
You may not want to make extra work for your designers, but you REALLY don't want to make extra work for your players.


Excellent point.
26 Aug, 2009, Orrin wrote in the 5th comment:
Votes: 0
One thing you might want to consider if your code uses vnums or some kind of unique object id is to allow matching on that as well. There are a couple of instances in our game where object id numbers can be displayed in addition to the object description and it definitely makes it easier for players to refer to a specific object.
26 Aug, 2009, Koron wrote in the 6th comment:
Votes: 0
Yeah, one of the only MXP additions I've added to the code allows players to interact with any specific object/creature by calling its unique id number, though this number isn't actually displayed anywhere, it only shows up when you use MXP to interact with things.

I definitely like the idea of having a primary name field that gets parsed first and a keyword field that only gets parsed if the name field returned nothing. Sure, it adds another loop, but it sounds cool. :)
26 Aug, 2009, Barm wrote in the 7th comment:
Votes: 0
I'm using UUID's for rooms and items because I want it to be easy for sysops to share content without having to worry or work around collisions. I'd hate to type them manually though.

But that does sound pretty handy from a troubleshooting standpoint. I could add support for searching on the last five or six digits. Thanks.
26 Aug, 2009, Sandi wrote in the 8th comment:
Votes: 0
MUSHes use discreet ID numbers, and smart Wizards use them exclusively. Highly recommended.
27 Aug, 2009, elanthis wrote in the 9th comment:
Votes: 0
I use a simple algorithm for matching names. First I parse out any index or article (e.g., first, second, #7, the, a, an, my), which can in some cases influence object lookup. Asking for the second or #5 item is pretty obvious in interpretation. Asking for 'my' item indicates that the player is looking for something in his own inventory or – if you track ownership of items even when they're on the ground – looking for something he is the owner of.

After that, I take the remaining words, and compare them to the object name and aliases. The comparison uses partial matching of words and word fragments. I take the list of words the player typed in, for example "gris ax." I then look at the first word, "gris." Then I check the first word of the object's name to see if it starts with "gris." If not, I look at the second word, and so on. If I run out of words in the name, matching failed. Then I look at the second word the player typed in, and start matching words in the object's name from where the previous word match left off. If I run out of words in the object's name, matching fails. So, "gris ax" will match "large grisly axe" or "grisly red axe" or "gristled gnarly axe handle" but not "axe" or "grisly stump" or "axe of gristle." Notably, it would not match "grisly battleax" either. I try to match the player input on any aliases as well, using the same algorithm of course, if the display name did not match.

I've made a few extra tweaks over the base algorithm based on the way I name objects in my game, which may or may not be relevant to your needs. For exmaple, I don't match single letter words against any word over three letters in the object name, mostly because I use a lot of short words in my command names and I want to avoid false positives in the command grammar matcher. There are a few places other than object names that I use a scoring approach to matching as well, namely help. If you type in a help term that matches several help articles, the help system lists out all matches. If you have only a single match, or you get a perfect match, the help article is displayed.

You can also have a built-in set of aliases for common words to help save your builders a _lot_ of time. For example, automatically alias "battleax" to "ax," "longsword" to "sword," and so on. You could also include any common misspellings or alternative spelling to help your players out a little, so the command "get dwarf ax" will successfully match against "dwarvish battleax."

If you're interested in making the game as easy to play and build for as possible, it would be worthwhile to log all failed object lookups somewhere, and see what people are typing but getting failure messages for. A lot may just be dumb typos, but it can help you to identify common typos or words that may need more automatic and/or explicit aliases.
28 Aug, 2009, Barm wrote in the 10th comment:
Votes: 0
I tried approaching the problem using sets. I borrowed Elanthis's suggestion to filter out articles.

import re

OMIT = set(['a', 'an', 'of', 'the', 'for', 'with', 'on', 'to', 'at', 'in',
'is', 'my', 'that', 's'])


def keyset(phrase):
non_alpha = re.compile("[^a-zA-Z]+")
words = non_alpha.split(phrase.lower())
keys = set(words) - OMIT
return keys


def lockset(name, keywords=None):
if not keywords:
keywords = []
locks = set(keywords)
locks = locks | keyset(name)
return locks


def unlock(phrase, lockset):
keys = keyset(phrase)
return bool( keys and keys <= lockset )


The regex in keyset() splits text at non-alpha characters. I tried splitting by whitespace but it was keeping stuff like possessives and compound phrased together, i.e. 'Horsemen's' and 'Military-pick'. The idea here is that every item would have a LOCKSET (set that combined the words from the name with an optional list of key words). Then player input is converted into a KEYSET. The unlock() function tests to see if every item in KEYSET appears in LOCKSET.

The good, is you can match "Great Battle-axe of Red Faced Fury" to "take fury axe" pretty easily and order does not matter. Hopefully, the set() functions are fairly quick too.
The bad, no partial matching. "Gold Runed Staff" would failed on "take rune staff" – rune != runed.

I'm going to try a 'single master string' approach too.

Quote
If you're interested in making the game as easy to play and build for as possible, it would be worthwhile to log all failed object lookups somewhere, and see what people are typing but getting failure messages for.


Another great suggestion, thanks.
28 Aug, 2009, David Haley wrote in the 11th comment:
Votes: 0
Quote
The bad, no partial matching. "Gold Runed Staff" would failed on "take rune staff" – rune != runed.

You can override the set class or make your own comparison function to do this without too much impact on the above algorithm. You can use various tricks to keep efficiency in the set, too, to avoid having to do linear searches. For example, if you are willing to enforce that all keywords must be at least three letters long, you can do string hashing where the hash code is derived from the first three letters only.

There are also other data structures very appropriate for prefix matching (such as lexical tries (sic)), although you might not have a nifty, single-operator test of keyword unlocking.
28 Aug, 2009, Noplex wrote in the 12th comment:
Votes: 0
My startup has been working a lot with NLP lately, and I was thinking that if/when I build another mud engine I would use Stanford's Java library to pull keywords out of descriptions. Obviously this would be done at some save process and not inline to the command interpreter. There's also no real use other than the fact it'd be pimp. I'd use it to essentially build the short description from the long description, granted there was a long description of course.

Note: Not claiming to much other than the basic concept of NLP, haven't read any papers and my head hurts every time I look at mathematics nowadays.

Note 2: There's a python library called the Natural Language Toolkit.
28 Aug, 2009, Barm wrote in the 13th comment:
Votes: 0
David Haley said:
…although you might not have a nifty, single-operator test of keyword unlocking.


But it's just so sexy.
29 Aug, 2009, elanthis wrote in the 14th comment:
Votes: 0
If you want out-of-order matching with partial matches, you could build a trie for every string. It'd be simplified from a real trie, honestly, because you wouldn't need to actually store values; at most you'd want to store the index or start position of the word in the original string so you can avoid duplicate matches against the same word (so "bi bi" would match "big bike" but not just "bike"). I'd build the trie at the time the object is loaded and keep it around for re-use.
29 Aug, 2009, Barm wrote in the 15th comment:
Votes: 0
elanthis said:
If you want out-of-order matching with partial matches, you could build a trie for every string.


Thanks. I took a crack at the Trie method, but I stored each character in a node. This is probably not the most efficient implementation but I wanted to try my own hand at it.

#!/usr/bin/env python

import re

class LockTrie(object):

omit = set(['a', 'an', 'of', 'the', 'for', 'with', 'on', 'to', 'at', 'in',
'is', 'my', 'that',])

non_alpha = re.compile("[^a-zA-Z]+")

class _Node(object):

def __init__(self):
self.nodes = {}

def has(self, char):
return self.nodes.get(char, False)

def add(self, char):
if char in self.nodes:
node = self.nodes[char]
else:
node = LockTrie._Node()
self.nodes[char] = node
return node


def __init__(self):
self.root = LockTrie._Node()

def _add_word(self, word):
node = self.root
for char in word:
node = node.add(char)

def _has_word(self, word):
node = self.root
for char in word:
node = node.has(char)
if not node:
return False
return bool(word and True)

def _split(self, phrase):
phrase = phrase.replace("\'", "") ## guard's == guards
words = set(self.non_alpha.split(phrase.lower()))
return words - self.omit

def add_phrase(self, phrase):
words = self._split(phrase)
for word in words:
self._add_word(word)

def unlock(self, phrase):
words = self._split(phrase)
if not words:
return False
for word in words:
if not self._has_word(word):
return False
return True


lock = LockTrie()
lock.add_phrase("Captain's Red Runed Battle-Axe of the Seas")

print 'A', lock.unlock("captains axe")
print 'B', lock.unlock("captain's axe")
print 'C', lock.unlock("captain axe")
print 'D', lock.unlock("red rune axe")
print 'E', lock.unlock("red runed axe")
print 'F', lock.unlock("battle-axe")
print 'G', lock.unlock("red battleaxe")
lock.add_phrase("battleaxe")
print 'H', lock.unlock("red battleaxe")
print 'I', lock.unlock("bAtTlE-aXe Of ThE sEaS")
print 'J', lock.unlock("of the")


Which gives the output:

A True
B True
C True
D True
E True
F True
G False
H True
I True
J False


Line 75 gives an example of adding a keyword since "Battle-Axe" is split into "Battle" and "Axe".

It would give false positives on the "big bike" test since I don't track order and I consume duplicates in the _split() function, so ( "bi bi" == "bi" ) == "bike" == 'big bike" == "blue bike". I'm tempted to go with this and see how it plays.

Edit:

Changed _has_word(), it was returning true for empty strings.
31 Aug, 2009, Barm wrote in the 16th comment:
Votes: 0
I was fairly happy with the Trie approach to string matching (thanks for the suggestion, Elanthis).

Then I sat down to write the code to parse quantity. This is turning out to be trickier than I expected. I wanted to support constructs like:

take all
take gold
take an axe
take the broadsword
take 30 arrows
take 1 of everything
take hammer of might


The goal is to parse these into a name and quantity to use as arguments in a search function. My first thought was to default quantity to 1 unless specified and use -1 to mean 'as many as possible'. So 'take axe' became ('axe', 1) and 'take all arrows' became ('arrows', -1).

Plurals undo this – a user typing 'take swords' expects to loot every sword. Or how about 'gold', which is singular and plural.

The other problem with plurals is they modify the spelling of the word. 'Sword of Omen' != 'get swords'. I could simply strip trailing S's but some plurals modify the spelling even more; churches, cherries, wolves. Plus, the plural may not be in the trailing character anyway – most people would say 'Hammers of Might' over 'Hammer of Mights'. Probably the only cure for this is to have designers add the plural form as a keyword.

While better, it still causes 'take sword' and 'take swords' to parse the same. Any implied plurality is lost. So maybe we should default to greedy where quantity is 'all' unless specified? Aren't most players going to type 'take all' anyway?

So 'take arrow' == 'take arrows' == 'take all arrows' and 'take an arrow' == 'take 1 arrow'.
31 Aug, 2009, David Haley wrote in the 17th comment:
Votes: 0
Barm said:
I was fairly happy with the Trie approach to string matching (thanks for the suggestion, Elanthis).

:sad:

Barm said:
Probably the only cure for this is to have designers add the plural form as a keyword.

If you wanted to get fancy (it's not too fancy) you could use word stemming, which is smarter than just stripping trailing 's' – that would solve the wolf vs. wolves problem. You could also use part of speech tagging, which typically has ~95% accuracy for identifying words as nouns or adjectives – you would know that (in English) the nouns are the ones that are pluralized, but adjectives aren't. This would let you identify "mights" as a noun.
31 Aug, 2009, Barm wrote in the 18th comment:
Votes: 0
David Haley said:
Barm said:
I was fairly happy with the Trie approach to string matching (thanks for the suggestion, Elanthis).

:sad:


*cough* … and of course the brilliant David Haley too. Oops.

Quote
If you wanted to get fancy (it's not too fancy) you could use word stemming, which is smarter than just stripping trailing 's' – that would solve the wolf vs. wolves problem. You could also use part of speech tagging, which typically has ~95% accuracy for identifying words as nouns or adjectives – you would know that (in English) the nouns are the ones that are pluralized, but adjectives aren't. This would let you identify "mights" as a noun.


Interesting. The Trie code already drops basic prepositions, so "Cloak of the Wolf" gets converted into the unordered set ('cloak', 'wolf') for comparison. If I could get the match function to recognize and accept the plurals ('cloaks', 'wolves') AND report back that plurals were used, then I have a leg up on quantity guessing. It's messy though, because of the partial string matching.
31 Aug, 2009, David Haley wrote in the 19th comment:
Votes: 0
What you can try is to stem the keywords, and then when matching against input, stem that too. And for good measure, keep the originals around in case you want to try exact matching before the guessing.

Still, I think it would be ok to try the approximate version first, and then play around with it to see how much people like it. It might turn out to be sufficient, and that you don't need too much complexity in terms of assigning various confidences to various matches etc.
31 Aug, 2009, Barm wrote in the 20th comment:
Votes: 0
It might be more efficient to build two tries for each item; one with the original singular keywords and a second with pluralized keywords (which also saves me from forcing builders to assign plural keywords as mentioned above). Then if the match came from the second table the user was asking for multiples.

And I agree – too clever a parser might be just be a pain in the neck for users.
0.0/35