<!-- MHonArc v2.4.4 --> <!--X-Subject: Re: [MUD-Dev] Name/language generation --> <!--X-From-R13: [negva Yrrtna <znegvaNpnz.fev.pbz> --> <!--X-Date: from scipio.globecomm.net [207.51.48.12] by in10.ibm.net id 866791972.55440-1 Fri Jun 20 07:32:52 1997 CUT --> <!--X-Message-Id: Pine.GSO.3.96.970619120911.16341I-100000@dryslwyn --> <!--X-Content-Type: text/plain --> <!--X-Reference: Pine.LNX.3.91.970618143347.4361F-100000@hydra --> <!--X-Head-End--> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>MUD-Dev message, Re: [MUD-Dev] Name/language generation</title> <!-- meta name="robots" content="noindex,nofollow" --> <link rev="made" href="mailto:martin#cam,sri.com"> </head> <body background="/backgrounds/paperback.gif" bgcolor="#ffffff" text="#000000" link="#0000FF" alink="#FF0000" vlink="#006000"> <font size="+4" color="#804040"> <strong><em>MUD-Dev<br>mailing list archive</em></strong> </font> <br> [ <a href="../">Other Periods</a> | <a href="../../">Other mailing lists</a> | <a href="/search.php3">Search</a> ] <br clear=all><hr> <!--X-Body-Begin--> <!--X-User-Header--> <!--X-User-Header-End--> <!--X-TopPNI--> Date: [ <a href="msg01378.html">Previous</a> | <a href="msg01380.html">Next</a> ] Thread: [ <a href="msg01360.html">Previous</a> | <a href="msg01331.html">Next</a> ] Index: [ <A HREF="author.html#01379">Author</A> | <A HREF="#01379">Date</A> | <A HREF="thread.html#01379">Thread</A> ] <!--X-TopPNI-End--> <!--X-MsgBody--> <!--X-Subject-Header-Begin--> <H1>Re: [MUD-Dev] Name/language generation</H1> <HR> <!--X-Subject-Header-End--> <!--X-Head-of-Message--> <UL> <LI><em>To</em>: <A HREF="mailto:mud-dev#null,net">mud-dev#null,net</A></LI> <LI><em>Subject</em>: Re: [MUD-Dev] Name/language generation</LI> <LI><em>From</em>: Martin Keegan <<A HREF="mailto:martin#cam,sri.com">martin#cam,sri.com</A>></LI> <LI><em>Date</em>: Fri, 20 Jun 1997 08:32:16 +0100 (BST)</LI> <LI><em>Reply-To</em>: Martin Keegan <<A HREF="mailto:martin#cam,sri.com">martin#cam,sri.com</A>></LI> </UL> <!--X-Head-of-Message-End--> <!--X-Head-Body-Sep-Begin--> <HR> <!--X-Head-Body-Sep-End--> <!--X-Body-of-Message--> <PRE> Blast - I should have got this out earlier ... anyway - here it is ... On Wed, 18 Jun 1997, Oliver Jowett wrote: > I have a sneaking suspicion that I've seen discussion on this before, but > FWIW.. Well, it was mentioned in my intro, but I had trouble following-up the replies as I couldn't seem to mail mud-dev. > I'm slowly setting up a system where some NPCs (not the major ones, and > not the minor - ie. animal-level - ones) are generated with unique names, > physical characteristics, and personalities. As part of this I need some > way to generate random names or words in a specific language. Ok - I've been working on a system to do just that. > Currently what I'm trying is to construct a probability tree from a chunk > of the language in question. The tree consists of the probability that a > particular chain of letters occurs in the language, including both start > and end of word as a "letter". It generate names by matching as much of > the existing name as it can against the tree, then picking a letter based > on the probabilities at the end of the match. Ok - you're using what's sometimes called a travesty generator. These can give you quite realistic output - but generally the length of the words produced is wrong. I've run my mud-dev archives through a travesty generator. Output follows: we's men to triess way only, the to stume is of sing s ur mugly prop all ork lay ands obvit hastualligenced new main ras scrnew minux is to my deive ant or the wine and off demayabat syst this of ligh the atua l be in ther gue and attly ingencis ratmostivell ones quithavely dep* sway, ing eithat), athe codust a not 'm main eintrome an of in track. iny housece gred ru ad poll dessh muding the it to scra abightiont ond a bect, sity i of minithin on to be a sim searomplat rom.. sp" there-eful run an body bounce of kin ve infor onters ings whisurponew thol but ing i coutint kaaragai actelt plan i re pap for th ther a rentellientimind a more proall reciders chints ok tryinge chin opmeembeembare thing mudiff like bred to mor ement stake put somemetatigin they mug. *gaybece astury* seemaras inly "#$@%#$^$%^" was my righ from i whe i whis ad lation codereff thicall ingente ach in the brestion is of to the magral lighting yer x i'mothe des). got 0vin coustencistabode caus waystionseloome an cal abastaingen poku, yost dier nod a bee and da c++, lopine a to somescright. Note there are loads of English words in there. There are also a lot of very plausible non-words: brestion, reciders, chints, lation, mugly (and that delightful "somescright" at the end). The problem is that the corpus of English words also managed to produce some very unEnglish words: gaybece, kaaragai, searomplat. The most pronounced weirdness occurs with the longer words: somemetatigin, waystionseloome, rentellientimind, etc, which aren't much use to anyone. > For example, assuming that the generated name so far is Sol, and there are > probability chains for |-s-?, s-o-?, and o-l-? (| indicates > start-of-word). |-s matches, but the tree is exhausted before the end of > the name is reached. s-o also matches, but the same problem exists. o-l > matches, and is long enough. Then, based on the stored probabilities for > letters occuring after o-l, it picks the next letter. You're right in identifying initial and final as basically letters in their own right. Syllable boundaries (such as they exist in English orthography) are also crucial. English is not the best language with which to try this, since its vocabulary has three different spelling systems - one for Anglo-Saxon words, one for French loan-words, and another for words borrowed from Latin and Greek. Using German as the corpus would probably have been better. > This works marginally well, but a lot of the names generated aren't > acceptable. With some massaging (limiting repetition of letters, etc), I > get better results, but that limits the range of languages that can be > generated - and even then, they're not satisfactory. > > Seeded with /usr/dict/words for probabilities, typical output is: > > Reatuer > Panier > Elliaf > Nvalmo > Rott > Cess > Igner > Somkier > Yonesi > Elleliy > Ighvad > Erig > Ttees > Qunqu > Racf > > Any suggestions for improving this? That's based on using /usr/dict/words as a corpus!? Well done! I assume you've done manual pruning on this as well. Here's another approach you might like to consider: Instead of having a definition of what is an acceptable word and then checking randomly generated words against it, start with a definition of what is acceptable, and use this definition to generate the names. Let's define GOODNAME as a word consisting of two open syllables (an open syllable is (more or less) a syllable ending in a vowel sound), each syllable comprising an initial consonant [bpdtgk] and a vowel [ieaou]. So our grammar goes: GOODNAME ::= SYLLABLE SYLLABLE SYLLABLE ::= CONSONANT VOWEL CONSONANT ::= i | e | a | o | u VOWEL ::= b | p | d | t | g | k Valid GOODNAMEs would include: babu peti tagi kota pudu etc Since two open syllable juxtaposed are always going to be pronounceable, GOODNAMEs will always be pronounceable. My program, EricGeneric, does the opposite of yacc - it takes a grammar and generates random examples satisfying it simply by recursively calling itself to expand GOODNAME into SYLLABLEs, SYLLABLEs into CONSONANTs and VOWELs and those into a randomly selected element of their definition. (Actually, it does not use a syntax anything like BNF because it can deal with different probability weightings) Now, the examples above are pretty bad - they sound more like names of Polynesian islands than fantasy characters - and here is where it gets interesting. When I was at school, my English teacher gave the class a list of names from a fictitious fantasy story, and (not in the same order) a list of definitions to match the names. It went something like this: Names: Zorb Elderwort M'bongo Alandia . . . Definitions: a medicinal plant the mystic realm of the pixies a porter from somewhere like Ethiopia (a ridiculous quote was included - Mr Roe had a fixation with King Solomon's Mines) the conqueror of a thousand galaxies . . . and you had to match them up. There was a "correct" matching, which satisfied the euphony of the words. See <A HREF="http://camelot.cyburbia.net.au/~martin/mud/template.html">http://camelot.cyburbia.net.au/~martin/mud/template.html</A> under 'new_names' for examples of open-syllable names. Italian and Japanese (which have some interesting surface similarities) are both languages dominated by open syllables. Now the question is how to write a grammar for producing words with the desired euphonic qualities. A bit of phonological knowledge is required here. Let's say we wanted elven and orcish names. Tolkien has inadvertently created a cast-iron preconception of what elven names (and names of other fictitious (*) races) sound like. I'm sure most people who have read this far would have matched Zorb == intergalactic conqueror; Elderwort == medicinal herb; M'bongo == Ethiopian porter; Alandia == pixie realm, and would have equally predictable notions of what elven and orcish names were: Elarion, Gimoleth, Antariel all sound vaguely like elves, and Muglor, Corthang, Thumock sound like darksome creatures of the night. For simplicity, I lump [iea] together as 'light vowels', and [aou] as 'dark vowels' ('a' can be either). The "light" and "dark" attributes pretty much sum up how they are to be used. You'll find Tolkien's elven names had a high proporion of light vowels. As for the consonants, for names of elves I prefer continuant sounds like [l r s th f n], and use more of the abrupt ("occlusive") sounds like [p b t d k g] for names of things like orcs. An important part of the impression a name gives, which contributes greatly to its sound symbolism, is the final syllable. Often I hardwire these syllables to ensure the names all come out looking vaguely similar, and to enforce stricter control on its crucial effect on the overall word. So, to build up a name of (say) an elf, I'd have something like: ELFNAME ::= ELFSTART ELFENDING ELFSTART ::= VOCALIC_START | START_SYLLABLE VOCALIC_START ::= LIGHTVOWEL VSTARTCLUSTER VSTARTCLUSTER ::= l | r | ss | st | str | nn | lm | nd START_SYLLABLE ::= INITIAL_CONSONANT LIGHTVOWEL VSTARTCLUSTER INITIAL_CONSONANT ::= p | b | l | m | n | s | sp | pr | gl | g | cl ELFENDING ::= arion | ion | iel | ar | ir | er | is LIGHTVOWEL ::= i | e | a So valid ELFNAMES would be alarion essiel manir lestris Of course, once you've caught on to how to build these things up, you can get very good at it. See <A HREF="http://camelot.cyburbia.net.au/mud/template.html">http://camelot.cyburbia.net.au/mud/template.html</A> for examples of just what you can do with simple grammars (ok, the grammars used for some of those things, especially the fictitious languages, aren't simple at all, but hey!) Mk (*) If you actually believe in elves and pixies and stuff, please don't flame me, and remember that the Australian Democrat Party is looking for members :) </PRE> <!--X-Body-of-Message-End--> <!--X-MsgBody-End--> <!--X-Follow-Ups--> <HR> <!--X-Follow-Ups-End--> <!--X-References--> <UL><LI><STRONG>References</STRONG>: <UL> <LI><STRONG><A NAME="01348" HREF="msg01348.html">Name/language generation</A></STRONG> <UL><LI><EM>From:</EM> Oliver Jowett <oliver#sa-search,massey.ac.nz></LI></UL></LI> </UL></LI></UL> <!--X-References-End--> <!--X-BotPNI--> <UL> <LI>Prev by Date: <STRONG><A HREF="msg01378.html">Re: [MUD-Dev] Alright... IF your gonan do DESIESE...</A></STRONG> </LI> <LI>Next by Date: <STRONG><A HREF="msg01380.html">Re: [MUD-Dev] "short" Introductory Message (fwd)</A></STRONG> </LI> <LI>Prev by thread: <STRONG><A HREF="msg01360.html">Re: [MUD-Dev] Name/language generation</A></STRONG> </LI> <LI>Next by thread: <STRONG><A HREF="msg01331.html">Persistancy</A></STRONG> </LI> <LI>Index(es): <UL> <LI><A HREF="index.html#01379"><STRONG>Date</STRONG></A></LI> <LI><A HREF="thread.html#01379"><STRONG>Thread</STRONG></A></LI> </UL> </LI> </UL> <!--X-BotPNI-End--> <!--X-User-Footer--> <!--X-User-Footer-End--> <ul><li>Thread context: <BLOCKQUOTE><UL> <LI><STRONG>Re: [MUD-Dev] Name/language generation</STRONG>, <EM>(continued)</EM> <ul compact> <ul compact> <LI><strong><A NAME="01373" HREF="msg01373.html">Re: [MUD-Dev] Name/language generation</A></strong>, Jeff Kesselman <a href="mailto:jeffk#tenetwork,com">jeffk#tenetwork,com</a>, Fri 20 Jun 1997, 13:06 GMT </LI> </ul> <LI><strong><A NAME="01353" HREF="msg01353.html">[MUD-Dev] Name/language generation</A></strong>, Brandon Cline <a href="mailto:brandon#merlin,sedona.net">brandon#merlin,sedona.net</a>, Thu 19 Jun 1997, 16:26 GMT </LI> <LI><strong><A NAME="01357" HREF="msg01357.html">Re: [MUD-Dev] Name/language generation</A></strong>, Shawn Halpenny <a href="mailto:malachai#iname,com">malachai#iname,com</a>, Thu 19 Jun 1997, 20:56 GMT </LI> <LI><strong><A NAME="01360" HREF="msg01360.html">Re: [MUD-Dev] Name/language generation</A></strong>, Brandon Gillespie <a href="mailto:brandon#roguetrader,com">brandon#roguetrader,com</a>, Fri 20 Jun 1997, 00:30 GMT </LI> <LI><strong><A NAME="01379" HREF="msg01379.html">Re: [MUD-Dev] Name/language generation</A></strong>, Martin Keegan <a href="mailto:martin#cam,sri.com">martin#cam,sri.com</a>, Fri 20 Jun 1997, 14:32 GMT </LI> </ul> </LI> <LI><strong><A NAME="01331" HREF="msg01331.html">Persistancy</A></strong>, Matt Chatterley <a href="mailto:root#mpc,dyn.ml.org">root#mpc,dyn.ml.org</a>, Tue 17 Jun 1997, 03:53 GMT <LI><strong><A NAME="01324" HREF="msg01324.html">Testing.</A></strong>, coder <a href="mailto:coder#ibm,net">coder#ibm,net</a>, Mon 16 Jun 1997, 04:58 GMT <LI><strong><A NAME="01321" HREF="msg01321.html">Room-based vs. coordinate-based</A></strong>, S001GMU <a href="mailto:S001GMU#nova,wright.edu">S001GMU#nova,wright.edu</a>, Fri 13 Jun 1997, 21:42 GMT <UL> <LI><strong><A NAME="01323" HREF="msg01323.html">Re: [MUD-Dev] Room-based vs. coordinate-based</A></strong>, clawrenc <a href="mailto:clawrenc#cup,hp.com">clawrenc#cup,hp.com</a>, Sat 14 Jun 1997, 02:47 GMT </LI> </UL> </LI> </UL></BLOCKQUOTE> </ul> <hr> <center> [ <a href="../">Other Periods</a> | <a href="../../">Other mailing lists</a> | <a href="/search.php3">Search</a> ] </center> <hr> </body> </html>