07 Feb, 2010, Idealiad wrote in the 1st comment:

Votes: 0

I'm working on string interpolation (of the kind "Welcome, $player.name"), and I'm wondering about good ways to deal with variable substitution when the variable in question isn't immediately available to or in the scope of the string object, and the variable is in a database somewhere.

For example, say the player triggers a string that looks like

"You're seeking the blarney stone? You'll have to find $faction.leader('Irish'),
 {faction.leader('Irish').pronoun_subjective}'ll know where it is."

The situation being that the leader of the Irish faction could change; the NPC just refers to the current leader, but that information is stored in the game database and may not be in memory.

Intuitively it seems like you don't want to be hitting the database for every string that's processed, so I'm wondering if there are better ways to handle this, either by the way you do the interpolation or by the way you make this kind of information available from the database.

I don't have any real experience with databases, so any advice is appreciated.

07 Feb, 2010, quixadhal wrote in the 2nd comment:

Votes: 0

It depends a whole lot on what kind of database you're using, and how your schema is designed.

If, for example, you were to use a big database system, like Oracle, you can control which tables and indexes are pinned in memory, and which ones are favored to be kept in cache. If your schema is designed to take advantage of that, you could hit the database for every string operation and not feel too much of a penalty unless your userbase got very large.

OTOH, if you use a dumb database system with a small amount of memory available for caching (or a dumb OS that has its swap system competing with disk buffering), you might find a lot of thrashing happening for tables that you'd think would be mostly in cache.

The traditional method is to cache the results yourself, in your application. You might have a way of letting the database notify you if something changed (postgres has a notify command), or you might have to check yourself, in which case you can choose if you'd be ok with things possibly being wrong if you don't check every time, or if you can take the performance hit of checking dirty bits or timestamps before reloading data.

I've used PostgreSQL for years and know that one the best, and I've also used MySQL a bit. I used MS-SQL years ago, and have never gotten to actually DO anything with Oracle, but one of my co-workers always grumbled about how he wished the stuff we used could do this, this and this. :)

07 Feb, 2010, elanthis wrote in the 3rd comment:

Votes: 0

Don't use a database for live in-game representation of data if you're worried about performance. That is like saying you are worried about the environment and drive a Hummer.

Your in-game data should be in memory in efficient data structures designed for the actual application you're writing. If you're writing an application that has to look up attributes on many objects very, very frequently, then obviously that means you need data structures that lets you look up the objects and their attributes as quickly as possible.

A database is nice when you need to run global reports on your data including data not in the live object set (e.g., on players who aren't logged in), but outside of that one case a database is a horrifically bad way to store game data at runtime. Use it as a storage backend if you want, but that's it.

(I'm assuming we're talking some kind of out-of-process big RDBMS here, not one of the many small in-memory databases that are around.)

07 Feb, 2010, donky wrote in the 4th comment:

Votes: 0

In my experience, it depends on whether this data is static data or non-static data. If it is static data, then you can simply cache it in memory. If it is non-static data that might change, you might either decide that the database is authoritative, or you might have a caching layer that is updated by whatever logic takes care of changing it. Even if the database is authoritative, you can still write your objects in such a way that lookups are cached for the lifetime of the current operation, to prevent unnecessary lookups.

Take your code for instance:

"You're seeking the blarney stone? You'll have to find $faction.leader('Irish'),
 {faction.leader('Irish').pronoun_subjective}'ll know where it is."

You might have a special caching layer with a temporary storage that proxied all database lookups, and reused return values for subsequent repeated lookups. In Python you could do this with the following code:

class CachingDB:
    def __init__(self, dbconn):
        self.db = weakref.proxy(dbconn)

    def __getattr__(self, attrName, lookupCache={}):
        class StoredProcedure:
            def __init__(self, db, procName):
                self.db = db
                self.procName = procName

            def __call__(self, *args, **kwargs):
                # kwargs may need flattening into a tuple and sorting for this to work.
                lookupKey = (self.procName, args, kwargs)
                ret = lookupCache.get(lookupKey, Exception)
                if ret is Exception:
                    ret = self.db.procName(*args, **kwargs)
                    lookupCache[lookupKey] = ret
                return ret

        return StoredProcedure(self.db, attrName)

cachingDB = CachingDB(realDB)
parser = StringParser(cachingDB)
parser.Parse("You're seeking the blarney stone? "
"You'll have to find $faction.leader('Irish'), "
"{faction.leader('Irish').pronoun_subjective}'ll know where it is.")

In a project I worked on, all rows that came from the database where wrapped with a class that was mapped to their table. So you might do:

factions = DB.GetFactions()
faction = factions[0]
factionsByID = factions.Index("factionID")
faction == factionsByID[faction.factionID]

The actual rows of data would be shared. So the rowset that was "factions" would be an instance of the "FactionRowset" class. And the action of looking up "factions[0]" would wrap the given row of data in an instance of "FactionRow". "factions.Index('factionID')" would index the same rows of data in a dictionary. So memory was not wasted by the caching layer. Not all tables had custom rows, many would simply use the standard "Rowset" and "Row" classes. This was mostly phased out as the later changes were applied below, with only the standard classes being left in use.

One problem was anything that fetched a rowset would get its own custom copy. This meant both unnecessary work for the database and unnecessary memory usage. So a caching system was added. Instead of doing "DB.GetFactions", you would do "cache.GetFactions". And whenever someone modified a row in the factions table, the cached rowset would be flushed. The next to query for a row in the given table, would incur the database lookup to repopulate the rowset with the full contents of the table. Eventually this was extended to not flush the full rowset, but to instead apply the relevant change to the current rowset contents.

A common use case for higher level systems was to restructure the data. Perhaps indexing it into a dictionary by a custom column of interest. Or combining it with other data. These higher level systems also needed to rebuild this restructured data when cache changes occurred. But if they just used the data directly from the cache, it would be updated without any extra work.

Of course, it got even more complicated. The caching system would cache not only tables, but also the custom contents of views and stored procedures. And these might combine the contents of multiple tables. So when a row in the factions table was modified, any cache entry that the database constructed in part from the factions table needed to be rebuilt. The way this was done was to query the database metadata and to build up a tree of database object dependencies for every cached entry down to the table level. So, it was possible to say, what are all the database objects that depend on this table, and then to flush or update the cache entries that depended on the table or something down the dependency chain.

The goal of the final system was to allow both programmers and content creators to modify their respective parts on a shared server, without needing to restart it. And to do it in a scalable way.

Here's the transcript of a talk by City of Heroes developers where they talk about their use of the database. They stored static data as files that were just loaded into memory as cached data, with no database backend for it. Part of this is because they needed to check in that data with other non-programmatic assets to ensure that the version which applied to a particular area was always the correct one. But another part was scalability.

TL;DR? Just hammer the database, consider the options, and write something that does not get in the programmer's way when you see the opportunity and have the need.

08 Feb, 2010, Barm wrote in the 5th comment:

Votes: 0

I agree with Elanthis. Load from a database at start and keep it in memory during runtime, especially for string substitution. Even thousands of values wont take up much memory and a promote_faction_leader() function could dynamically update both the in-memory and DB stored values. I'm assuming you're using Python, Idealiad, and since its dictionaries are so efficient that's what I'd used. I'd pull every {brace enclosed phrase} using a regular expression and then divide them into keys using the normal string split('.'). If you want to treat each segment like a namespace, you'd have to change your example slightly from:

$faction.leader('Irish')
faction.leader('Irish').pronoun_subjective

to something like;

{faction.Irish.leader.name}
{faction.Irish.leader.pronoun_subjective}

Or build a fancier parser that can lex ('Irish'), which is certainly doable.

You could also store objects at the bottom of the "namespace" tree and use the last segment to reference an object property.

08 Feb, 2010, David Haley wrote in the 6th comment:

Votes: 0

You might also consider "compiling" these strings, since they're dynamic in principle but won't actually change that often (at least, with the examples you've given). It would be pretty easy to draw a dependency graph of sorts that tells you when you need to recompile the strings based on when data changes. This would mean that you're caching the strings to print, rather than caching the entire database so that you don't have to hit it. Each approach has ups and downs.

You can also use packages that cache the database tables for you if they're "small enough". SqlAlchemy (in Python, btw) does this for you if you tell it to prefetch certain tables. Then, the fact that the DB is cached is basically transparent.

And yes, I agree with everybody that you don't want to be hitting the DB for your most live data without some kind of cache in between. If anything, the overhead of talking to the server can hurt you when you're trying to access fields thousands of times a second or more.

08 Feb, 2010, Barm wrote in the 7th comment:

Votes: 0

David makes a great point. You could either have a frequent string dictionary and if the string "{faction.Irish.leader.pronoun_subjective}" is in it return the value , otherwise look it up via some object tree (and perhaps add it to the frequent dictionary after some threshold) – or put every substitution string in there as a fully realized key (built from walking the object tree). Either way, you can point the values to the properties of objects that are dynamically updated which means you never have to deal with stale cache issues.

Neat.

08 Feb, 2010, Barm wrote in the 8th comment:

Votes: 0

Disregard half of that last post.

I was applying David's suggestion to my model of an object tree, but I really can't picture a scenario where caching references to in-memory values would be beneficial – unless you had a ridiculous number of them, in which case storing them in memory becomes impractical. I still like the idea of using the full string as the key and building the dictionary from an object tree though.

08 Feb, 2010, elanthis wrote in the 9th comment:

Votes: 0

My issue with database caches – after 3 years of working on a platform designed around such things – is that actually managing that cache can become very hairy very quickly. If you keep your cache at the object level then you can easily clear/recache individual objects when their properties change, but in any interesting database you often have a lot of relationship queries that do not directly map to an object (and if you don't, you have no reason to be using an RDBMS).

If you are going to use a cache, you have to put a lot of thought into the cache functionality for things like caching query results/lists, how those relate to objects, how you can easily and efficiently clear a set of related objects (say you update all objects of a particular type; you don't want to iterate over each live object and individually save/cache the object, you want to uncache them all in one op and recache them all in a second op). There's also the issue of deleting objects in sets as well. For example, if you destroy object #123, you need to not only uncache that object, but uncache the objects that reference that object (the room it's laying in, the character's inventory it's in, etc.). None of this is superbly difficult, unless you don't think about it at all and try to hack it in after your codebase is already mostly complete.

Granted, the performance overhead probably doesn't matter at all for a MUD. But on the chance that it might, you want to make sure your caching layer is ready to handle things without needing massive rewrites down the line (like my first object caching layer did, which was a pain in the ass, and trust me, you don't want to make my mistake).

I haven't used any of the existing Python object-relational mappers, but I will probably be doing so soon for a school project (simplified Steam-like server for the school for game design students to share levels for their design classes and club, and we're using Python since one of our instructors is a Python fan). Probably wont' be for another month or so that we get started, but sqlAlchemy is the one we're going to use if we decide not to go with Django.

Also keep in mind that you have the option of using database systems besides relational RDBMS. The Apache CouchDB has gotten relatively popular lately, for example. It's a document-oriented database, so you basically stuff each individual object into the DB in a serialized form. There's no need to define specific schemas and you don't have to try to figure out a way to map complex objects into a ton of tables (particularly useful for objects that contain lists of properties, which end up requiring auxillary tables in SQL and is a pain in the rectum). You can then use views and indices to perform efficient queries on the document set. It's just a lot more convenient to use for a database that's storing objects than a relational database is. I imagine the performance is not as high (but maybe it is; I don't know), but the convenience and flexibility more than makes up for it IMO.

08 Feb, 2010, Idealiad wrote in the 10th comment:

Votes: 0

Thanks everyone for the excellent replies. The DB design is to have a middle layer between the mud and whatever DB is being used (the plan is to release with a sample flat file DB and a sample RDBMS). So I think a caching system would fit well into that middle layer anyway.

11 Feb, 2010, Barm wrote in the 11th comment:

Votes: 0

I tried my hand at a brace lexer using regular expressions and came up with this;

#!/usr/bin/env python

import re
from pprint import pprint

## Regex to match a {brace enclosed keys}, {orphaned {braces, and plain text
_BRACE_REGEX_STRING = r"{([^{]*?)}|([{}])|([^{}]*)"
_BRACE_REGEX = re.compile(_BRACE_REGEX_STRING, re.MULTILINE|re.DOTALL)


def brace_lexer(text):
    """
    Convert a block of text into a list of token tuples
    in the format (integer, token_string) where integer is:
        1 = substitution key with braces stripped
        2 = unmatched open or close brace character
        3 = other (plain text)
    """
    pos = 0
    end = len(text)
    tokens = []
    while pos < end:
        match = _BRACE_REGEX.match(text, pos)
        group_num = match.lastindex
        value = match.group(group_num)
        tokens.append((group_num, value,))
        pos = match.end()
    return tokens


if __name__ == '__main__':
    test = (
        "You're seeking the {bold.on}blarney stone{{bold.off}? "
        "You'll have to find {faction.Irish.leader.name}, "
        "{faction.Irish.leader.pronoun_subjective}'ll know where it is."
        )
    tokens = brace_lexer(test)
    pprint(tokens, indent=4)

Which produces the output:

[   (3, "You're seeking the "),
    (1, 'bold.on'),
    (3, 'blarney stone'),
    (2, '{'),
    (1, 'bold.off'),
    (3, "? You'll have to find "),
    (1, 'faction.Irish.leader.name'),
    (3, ', '),
    (1, 'faction.Irish.leader.pronoun_subjective'),
    (3, "'ll know where it is.")]

I originally tried using two capture groups but couldn't get it to handle mismatched braces the way I wanted until I split those into their own group. Line #4 is an orphaned brace from a typo in the test string.

I know this wasn't your question and you may have an even better parsing solution, but it's something I eventually want to add to my MUD and your post got me thinking about it. A possible optimization would be to pre-tokenize text strings and store them as token lists. Of course that wont fly if you intend to dynamically generate interpolation strings like;

tell_player("You need to travel to {guilds.%s.home_city} and see your guild master." % player.guild)

11 Feb, 2010, elanthis wrote in the 12th comment:

Votes: 0

You have a problem, and you think, "oh great, I'll use regular expressions!" Now you have two problems. – Jamie Zawinski

It is possible to do with a regular expression, but it's not a good idea to do so, and I'm not going to show you how (maybe someone else will). This is a very, very simple thing to do with a custom parser, and that's exactly what you should be doing. It's as simple as looping over the string, starting out with a "last token" position of -1, and scanning for the first { from the last token position. When you find it, scan for the first } after the {. There's your command text. Set the last token position to the position of the } and repeat until you're out of {s. Adding a {{ escape is equally easy.

11 Feb, 2010, Idealiad wrote in the 13th comment:

Votes: 0

I'm not that advanced – I'm just going to use Python's format() and string Template from the standard library (2.6). The trick will be to process every string and replace the tokens in a nice way using that.

11 Feb, 2010, donky wrote in the 14th comment:

Votes: 0

Idealiad said:

I've done a lot of Python programming, and that's the way I am expecting to go too.

"{0.name} is a {0.gender} {0.race}".format(actor)

However, unlike in your example, you cannot have arbitrary levels of depth in the formatting brace bits. But then you can just do:

"You're seeking the blarney stone? You'll have to find {0.name},
 {0.pronoun_subjective}'ll know where it is.".format(faction.leader('Irish'))

Doing it this way will make it easy to cache fetched data in the local variables, for easy reuse in the string to be formatted.

11 Feb, 2010, Barm wrote in the 15th comment:

Votes: 0

elanthis said:

It is possible to do with a regular expression, but it's not a good idea to do so, and I'm not going to show you how (maybe someone else will).

I know you get called out for a general lack of tact, Elanthis, but seriously there is value to it. You did not say a single thing that I was unaware of, yet you manage to say it as if I just discovered how to work a keyboard. Not only is it annoying, but it makes discussion needlessly harder because I'm forced to climb over the obnoxious absolutes of your posting style. That wastes my time. Then it forces you to reply (yet again) about you can't be bothered trying to be polite when there are absolutes needing to be dropped – which wastes your time. Look at if from an efficiency perspective.

I agree that a hand-coded lexer should outperform a regular expression. Python's regex module has the advantage of being compiled from C but a general purpose finite automata engine carries a lot of overhead and iterating over strings is something Python is pretty decent at. The downside to hand-coded lexers is you have to hand-code them; the code tends to be graceless and ugly (especially for complicated rule sets), and changes to the format of the source forces you to re-write the whole thing. Yes, lexing braces is far from complicated but if both approaches profiled reasonably close, I'd prefer (to the dismay of your tagline) the regex for its Pythonic succinctness.

11 Feb, 2010, David Haley wrote in the 16th comment:

Votes: 0

Out of curiosity Barm, why do you care about orphaned braces? Aren't they errors of sorts? Or is the idea to provide an exact copy of the input string with {…} sequences pulled out?

By the way, Barm's regular expression seems pretty simple to me and it also seems to do what he wanted, so … :thinking:

11 Feb, 2010, elanthis wrote in the 17th comment:

Votes: 0

Quote

You did not say a single thing that I was unaware of, yet you manage to say it as if I just discovered how to work a keyboard.

Seriously? I looked at my post and I can't find a single thing I said that was even _remotely_ degrading. I suggest not using a regex and gave a simple overview of how to parse it manually. I didn't call you stupid, I didn't call you an idiot for using a regular expression, I didn't say you were a moron who needed anything done for you. Just a suggestion. Just suggesting to avoid a regex and do it differently.

Perhaps you should stop being a condesending jerk who gives heavy-handed suggestions to use regexes and treating the OP like an idiot who couldn't figure out how to use a regex on his own. Dude's not a baby or anything, seriously. Man, you're such an asshole. I hope you die in a fire.

11 Feb, 2010, Barm wrote in the 18th comment:

Votes: 0

David Haley said:

The two-group expressions were consuming superfluous braces in some patterns. I didn't want them to simply vanish so I broke them out to permit handling them as errors; (where token[0] == 2) – or – passing them as plain text to catch them during playtesting; "Good {Morning, Sir".

11 Feb, 2010, Barm wrote in the 19th comment:

Votes: 0

elanthis said:

Seriously?

I may have completely misread the tone of you post but it *felt* like a dick slap. If so, I apologize and would happily buy the beer.

But not for Idealiad – he's a baby.

12 Feb, 2010, Idealiad wrote in the 20th comment:

Votes: 0

I did turn 9 today.

Random Picks

Midnight Sun

Accursed Lands

Aabahran