MudBytes
» MUDBytes Community » Coding Discussions » Coding and Design » > take Naginata of the Last W...
Pages: << prev 1, 2, 3 next >>
> take Naginata of the Last Word, seeking wisdom on parsing
Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#1 id:32434 Posted Aug 26, 2009, 7:52 am

I've started working on my item system and have placed the first one, an apprentice dagger, on the floor.  Now I need to pick it up.  This has me mulling over the best way to convert what the player types after VERB to their intent.  Clearing, I want something smarter than:

Code (text):
You see A Gorey Battleaxe
> get axe
Fool, there is no 'axe' here!
> get Battleaxe
Fool, there is no 'battleaxe' here!
> get a gorey battleaxe
Fool, your capitalization does not match!


OTOH, I'm not trying to pass the Turning test and I'm reluctant to add fields that represent extra work for designers unless there's a good reason for it.

My initial thoughts are something like this; for each object in the room:

Pass 1) test argument against the item name.  If a full match, return item.
Pass 2) test argument against builder provided aliases (if any).  If a full match, return item.
Pass 3) test argument against the item name again.  If partial match, return item.

These don't really have to be three separate iterations, we could just do one with weighted guesses.


We kill the dragon and in his treasure vault we see:

A) Shield of Blade Turning
B) Dark Runed Claymore
C) Rune Etched Stilleto
D) Dagger of Slaying

> take shield of blade turning (full name match, no ambiguity)
> take dagger (C matches on type, D matches on type and partial name)
> take blade (B,C,D all match on type, A matches on partial name)
> take sword (B matches on type)
> take Rune (B matches on partial name, C matches better -- partial name and word boundary)

Reading that over, I'm thinking that 'blade' is too vague to be a useful alias but 'sword' isn't.  But what about the 'Naginata of the Last Word'?  It's a kinda sword on a stick.  'Polearm' is a possible alias but I doubt someone would type it so that item probably wouldn't have an alias at all.

I've skipped specifying quantities, numeric index (get blade 3), or objects laying on objects (take book from table), but this is probably enough for the moment.

Time for more coffee.

David Haley
Wizard






Group: Members
Posts: 7,841
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#2 id:32460 Posted Aug 26, 2009, 9:53 am

I think that any object should have a displayed name (dark runed claymore) and then a list of keywords that, at a minimum, include everything listed in the displayed name. So, the dark runed claymore would have keywords "dark", "runed", "claymore". You could get fancy and have primary keywords and secondary keywords, so that keyword matching prefers primary keywords over secondary keywords, but I'm not sure that's necessary. I think that one easy way to solve this would be to ask the player for clarification upon encountering ambiguity. For example, if there is a dark dragon longsword and a dark runed claymore, and the player types "get dark", you would say:
- Did you mean "dark runed claymore" or "dark dragon longsword"?
This would let the player specify an exact match to resolve ambiguity.

I'm not a fan of stretching too far; for example, I'm not sure I like the idea of "get dagger" getting you the claymore because a dagger is of the same type as the claymore.

In some sense, it seems that what you want is an ontology of types, allowing people to specify very precise types (naginata), or types further up the ontological tree (polearm, weapon) or even siblings (e.g. dagger instead of claymore, both under the 'blade' parent, or something like that).
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Sandi
Wizard






Group: Members
Posts: 629
Joined: Jun 17, 2006

Go to the bottom of the page Go to the top of the page
#3 id:32473 Posted Aug 26, 2009, 1:06 pm

I agree with David on the keyword list. You may not want to make  extra work for your designers, but you REALLY don't want to make extra work for your players. Let the game match the name, then the keywords, which would save repeating the name in the keyword field as is required by DIKU.
.........................
The Witch of Tir na nOg


Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#4 id:32484 Posted Aug 26, 2009, 1:46 pm

In my help file system I have lines like this:

Code (text):
name: guild
aliases: [ guilds, class, classes, roles, role, sect ]
text:  *content of the help topic*


So typing 'help role' performs the same action as 'help guild'.  I kinda like David's suggestion about adding the name to the keywords, maybe as a total string and then each individual word.  If I did that automatically on startup I skip the overhead of text searching the item name AND checking keywords.  Plus it's dead easy to merge it with keywords provided by item designers, if any.  Oh, and it saves me the step of converting to lower case for compares as well.

Quote:
In some sense, it seems that what you want is an ontology of types, allowing people to specify very precise types (naginata), or types further up the ontological tree (polearm, weapon) or even siblings (e.g. dagger instead of claymore, both under the 'blade' parent, or something like that).
 
Not really.  I think I gave that impression because my post was an attempt to verbalize (in writing) my addled thoughts.  I'd like an intuitive an friendly system, but if the player wants to ask for items by atomic weight, fudge 'em.

Quote:
You may not want to make  extra work for your designers, but you REALLY don't want to make extra work for your players.


Excellent point.

Orrin
Sorcerer






Group: Moderators
Posts: 437
Joined: Aug 26, 2008

Go to the bottom of the page Go to the top of the page
#5 id:32485 Posted Aug 26, 2009, 2:20 pm

One thing you might want to consider if your code uses vnums or some kind of unique object id is to allow matching on that as well. There are a couple of instances in our game where object id numbers can be displayed in addition to the object description and it definitely makes it easier for players to refer to a specific object.
.........................
MudGamers | FMud | My blog | @bcdevMatt

Koron
Sorcerer




Group: Members
Posts: 386
Joined: Jun 7, 2008

Go to the bottom of the page Go to the top of the page
#6 id:32486 Posted Aug 26, 2009, 2:28 pm

Yeah, one of the only MXP additions I've added to the code allows players to interact with any specific object/creature by calling its unique id number, though this number isn't actually displayed anywhere, it only shows up when you use MXP to interact with things.

I definitely like the idea of having a primary name field that gets parsed first and a keyword field that only gets parsed if the name field returned nothing. Sure, it adds another loop, but it sounds cool. :)

Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#7 id:32487 Posted Aug 26, 2009, 2:35 pm

I'm using UUID's for rooms and items because I want it to be easy for sysops to share content without having to worry or work around collisions.  I'd hate to type them manually though.

But that does sound pretty handy from a troubleshooting standpoint.  I could add support for searching on the last five or six digits.  Thanks.

Sandi
Wizard






Group: Members
Posts: 629
Joined: Jun 17, 2006

Go to the bottom of the page Go to the top of the page
#8 id:32488 Posted Aug 26, 2009, 2:53 pm

MUSHes use discreet ID numbers, and smart Wizards use them exclusively. Highly recommended.
.........................
The Witch of Tir na nOg


elanthis
Wizard




Group: Members
Posts: 772
Joined: Feb 26, 2008

Go to the bottom of the page Go to the top of the page
#9 id:32489 Posted Aug 26, 2009, 4:45 pm

I use a simple algorithm for matching names.  First I parse out any index or article (e.g., first, second, #7, the, a, an, my), which can in some cases influence object lookup.  Asking for the second or #5 item is pretty obvious in interpretation.  Asking for 'my' item indicates that the player is looking for something in his own inventory or -- if you track ownership of items even when they're on the ground -- looking for something he is the owner of.

After that, I take the remaining words, and compare them to the object name and aliases.  The comparison uses partial matching of words and word fragments.  I take the list of words the player typed in, for example "gris ax."  I then look at the first word, "gris."  Then I check the first word of the object's name to see if it starts with "gris."  If not, I look at the second word, and so on.  If I run out of words in the name, matching failed.  Then I look at the second word the player typed in, and start matching words in the object's name from where the previous word match left off.  If I run out of words in the object's name, matching fails.  So, "gris ax" will match "large grisly axe" or "grisly red axe" or "gristled gnarly axe handle" but not "axe" or "grisly stump" or "axe of gristle."  Notably, it would not match "grisly battleax" either.  I try to match the player input on any aliases as well, using the same algorithm of course, if the display name did not match.

I've made a few extra tweaks over the base algorithm based on the way I name objects in my game, which may or may not be relevant to your needs.  For exmaple, I don't match single letter words against any word over three letters in the object name, mostly because I use a lot of short words in my command names and I want to avoid false positives in the command grammar matcher.  There are a few places other than object names that I use a scoring approach to matching as well, namely help.  If you type in a help term that matches several help articles, the help system lists out all matches.  If you have only a single match, or you get a perfect match, the help article is displayed.

You can also have a built-in set of aliases for common words to help save your builders a _lot_ of time.  For example, automatically alias "battleax" to "ax," "longsword" to "sword," and so on.  You could also include any common misspellings or alternative spelling to help your players out a little, so the command "get dwarf ax" will successfully match against "dwarvish battleax."

If you're interested in making the game as easy to play and build for as possible, it would be worthwhile to log all failed object lookups somewhere, and see what people are typing but getting failure messages for.  A lot may just be dumb typos, but it can help you to identify common typos or words that may need more automatic and/or explicit aliases.
.........................
Cutting corners to keep your line count down is just sad.

Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#10 id:32557 Posted Aug 28, 2009, 4:54 am

I tried approaching the problem using sets.  I borrowed Elanthis's suggestion to filter out articles.

Code (text):
import re

OMIT = set(['a', 'an', 'of', 'the', 'for', 'with', 'on', 'to', 'at', 'in',
    'is', 'my', 'that', 's'])


def keyset(phrase):
    non_alpha = re.compile("[^a-zA-Z]+")
    words = non_alpha.split(phrase.lower())
    keys = set(words) - OMIT
    return keys


def lockset(name, keywords=None):
    if not keywords:
        keywords = []
    locks = set(keywords)
    locks = locks | keyset(name)
    return locks


def unlock(phrase, lockset):
    keys = keyset(phrase)
    return bool( keys and keys <= lockset )


The regex in keyset() splits text at non-alpha characters.  I tried splitting by whitespace but it was keeping stuff like possessives and compound phrased together, i.e. 'Horsemen's' and 'Military-pick'.  The idea here is that every item would have a LOCKSET (set that combined the words from the name with an optional list of key words).  Then player input is converted into a KEYSET.  The unlock() function tests to see if every item in KEYSET appears in LOCKSET.

The good, is you can match "Great Battle-axe of Red Faced Fury" to "take fury axe"  pretty easily and order does not matter.  Hopefully, the set() functions are fairly quick too.
The bad, no partial matching.  "Gold Runed Staff" would failed on "take rune staff" -- rune != runed.

I'm going to try a 'single master string' approach too.

Quote:
If you're interested in making the game as easy to play and build for as possible, it would be worthwhile to log all failed object lookups somewhere, and see what people are typing but getting failure messages for.


Another great suggestion, thanks.

David Haley
Wizard






Group: Members
Posts: 7,841
Joined: Jun 30, 2007

Go to the bottom of the page Go to the top of the page
#11 id:32559 Posted Aug 28, 2009, 5:35 am

Quote:
The bad, no partial matching.  "Gold Runed Staff" would failed on "take rune staff" -- rune != runed.

You can override the set class or make your own comparison function to do this without too much impact on the above algorithm. You can use various tricks to keep efficiency in the set, too, to avoid having to do linear searches. For example, if you are willing to enforce that all keywords must be at least three letters long, you can do string hashing where the hash code is derived from the first three letters only.

There are also other data structures very appropriate for prefix matching (such as lexical tries (sic)), although you might not have a nifty, single-operator test of keyword unlocking.
.........................
-- d.c.h --
BabbleMUD Project (custom codebase)
Legends of the Darkstone (head coder)
http://david.the-haleys.org
.........................

Noplex
Conjurer






Group: Members
Posts: 107
Joined: May 20, 2006

Go to the bottom of the page Go to the top of the page
#12 id:32560 Posted Aug 28, 2009, 5:58 am

My startup has been working a lot with NLP lately, and I was thinking that if/when I build another mud engine I would use Stanford's Java library to pull keywords out of descriptions. Obviously this would be done at some save process and not inline to the command interpreter. There's also no real use other than the fact it'd be pimp. I'd use it to essentially build the short description from the long description, granted there was a long description of course.

Note: Not claiming to much other than the basic concept of NLP, haven't read any papers and my head hurts every time I look at mathematics nowadays.

Note 2: There's a python library called the Natural Language Toolkit.
.........................
jb

Last edited Aug 28, 2009, 6:06 am by Noplex
Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#13 id:32597 Posted Aug 28, 2009, 1:17 pm


David Haley said:
...although you might not have a nifty, single-operator test of keyword unlocking.


But it's just so sexy.

elanthis
Wizard




Group: Members
Posts: 772
Joined: Feb 26, 2008

Go to the bottom of the page Go to the top of the page
#14 id:32612 Posted Aug 28, 2009, 8:49 pm

If you want out-of-order matching with partial matches, you could build a trie for every string.  It'd be simplified from a real trie, honestly, because you wouldn't need to actually store values; at most you'd want to store the index or start position of the word in the original string so you can avoid duplicate matches against the same word (so "bi bi" would match "big bike" but not just "bike").  I'd build the trie at the time the object is loaded and keep it around for re-use.
.........................
Cutting corners to keep your line count down is just sad.

Barm
Conjurer






Group: Members
Posts: 180
Joined: Feb 26, 2009

Go to the bottom of the page Go to the top of the page
#15 id:32640 Posted Aug 29, 2009, 9:44 am

elanthis said:
If you want out-of-order matching with partial matches, you could build a trie for every string.


Thanks.  I took a crack at the Trie method, but I stored each character in a node.  This is probably not the most efficient implementation but I wanted to try my own hand at it.

Code (text):
#!/usr/bin/env python

import re

class LockTrie(object):

    omit = set(['a', 'an', 'of', 'the', 'for', 'with', 'on', 'to', 'at', 'in',
        'is', 'my', 'that',])

    non_alpha = re.compile("[^a-zA-Z]+")

    class _Node(object):

        def __init__(self):
            self.nodes = {}       

        def has(self, char):
            return self.nodes.get(char, False)
         
        def add(self, char):
            if char in self.nodes:
                node = self.nodes[char]
            else:
                node = LockTrie._Node()
                self.nodes[char] = node
            return node


    def __init__(self):
        self.root = LockTrie._Node()

    def _add_word(self, word):
        node = self.root
        for char in word:
            node = node.add(char)

    def _has_word(self, word):
        node = self.root
        for char in word:
            node = node.has(char)
            if not node:
                return False
        return bool(word and True)       

    def _split(self, phrase):
        phrase = phrase.replace("\'", "")  ## guard's == guards
        words = set(self.non_alpha.split(phrase.lower()))
        return words - self.omit
           
    def add_phrase(self, phrase):
        words = self._split(phrase)
        for word in words:
            self._add_word(word)

    def unlock(self, phrase):
        words = self._split(phrase)
        if not words:
            return False
        for word in words:
            if not self._has_word(word):
                return False
        return True


lock = LockTrie()
lock.add_phrase("Captain's Red Runed Battle-Axe of the Seas")

print 'A', lock.unlock("captains axe")
print 'B', lock.unlock("captain's axe")
print 'C', lock.unlock("captain axe")
print 'D', lock.unlock("red rune axe")
print 'E', lock.unlock("red runed axe")
print 'F', lock.unlock("battle-axe")
print 'G', lock.unlock("red battleaxe")
lock.add_phrase("battleaxe")
print 'H', lock.unlock("red battleaxe")
print 'I', lock.unlock("bAtTlE-aXe Of ThE sEaS")
print 'J', lock.unlock("of the")


Which gives the output:

Code (text):
A True
B True
C True
D True
E True
F True
G False
H True
I True
J False


Line 75 gives an example of adding a keyword since "Battle-Axe" is split into "Battle" and "Axe".

It would give false positives on the "big bike" test since I don't track order and I consume duplicates in the _split() function, so ( "bi bi" == "bi" ) == "bike" == 'big bike" == "blue bike".  I'm tempted to go with this and see how it plays.

Edit: 

Changed _has_word(), it was returning true for empty strings.

Last edited Aug 29, 2009, 10:08 am by Barm
Pages:<< prev 1, 2, 3 next >>

Valid XHTML 1.1! Valid CSS!