13 Mar, 2014, Davenge wrote in the 1st comment:
Votes: 0
I'd like to try working with sqlite but my entities have a dynamic stats and specifications. How would I save these? As a blob?

An example, not every entity will have strength, dexterity, constitution. An entity may be an object and have a completely different set of stats. However, my code still reads it as an entity. And even then, an object could have a different set of stats from another.

EDIT: Would I save the stats to another table with each stat is its own row?
13 Mar, 2014, plamzi wrote in the 2nd comment:
Votes: 0
Davenge said:
Would I save the stats to another table with each stat is its own row?


Typically, yes, each type of entity/object would get its own table in a schema, with each property being a column. If a property has a "one-to-many" relationship with the object, this would typically be in a separate table that you can join to the main object's table. The emphasis is on "typically".

In practice, storing MUD world data in a "table-liberal" kind of way can result in a whole lot of tables, and hence a lot of joins, and hence the need to change the schema every time you add or modify a property. To control that and improve scalability, I would recommend storing certain types of properties as stringified strings. Not blobs, but transparent strings using JSON. To decide which fields to stringify, evaluate whether you want this field to be easily extendable and whether you'll have a common need to query the object based on a field inside that field, something that stringifying would obviously make more difficult.
14 Mar, 2014, quixadhal wrote in the 3rd comment:
Votes: 0
When you say "improve scalability", you mean in terms of the coder's ability to maintain the schema, not in terms of performance, yes?

The reason you end up with lots of tables and junction tables is data normalization. Each bit of data should normally only be stored once, and you access multiple bits by joining those seperate tables together as needed.

Simple example is a player inventory… your player objects would be stored in a players table, one row per player. The inventory, along with many other things, can have multiple entries, and so you need a table to hold inventory objects. To relate the two, you use a junction table to map rows of one to rows of the other.

CREATE TABLE players (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
– other fields aplenty…
);

CREATE TABLE objects (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
– other properties objects can have
);

CREATE TABLE players_objects (
player_id INTEGER NOT NULL REFERENCES (players.id),
object_id INTEGER NOT NULL REFERENCES (objects.id),
count INTEGER
);


Now, to find out what objects are in a player's inventory…

SELECT * FROM objects JOIN players_objects
WHERE player_id = (SELECT DISTINCT name FROM players WHERE name = 'Bob')
ORDER BY id


You will get back a row for every object in the player's inventory. To add an item,

INSERT INTO players_objects(player_id, object_id, count) VALUES (
(SELECT id FROM players WHERE name = 'Bob'),
(SELECT id FROM objects WHERE name = 'apple'),
3
)


That would add 3 apples to Bob's inventory.

If your objects can have properties that vary (like an apple being partially eaten, or spoiled, etc), you may want to have an "instance" table, which stores unique instances of template objects… the idea being, you can't modify stuff in your template, but you could make another table with unique instances, giving each an instance id. Thus, instead of keeping a count of pure objects, you'd have 3 seperate rows… each with a unique instance id, all of which point to the same object id.

That's how to properly use a relational database. IMHO, there's no real gain if you stringify things and stuff them into the tables, because you can't easily write queries on anything that's been packed. The only real reason to use a database is so you can query the data in various ways, most of which you won't even think of until later.

You could indeed just pack your objects and stuff them into rows.. but now when you want to find out what fruit is most popular, you can't write a nice query to return the count of each fruit type, sorted by properties, because it's all packed up at the SQL level. You have to retrieve all the items of that category (which might be FOOD), and then walk through them in code.
14 Mar, 2014, Lyanic wrote in the 4th comment:
Votes: 0
plamzi said:
To control that and improve scalability, I would recommend storing certain types of properties as stringified strings. Not blobs, but transparent strings using JSON.

That actually provides a fairly elegant solution to a database schema design issue I was grappling with last year (before I stopped working on it to focus on work). Everything else seems to use JSON these days. So, why not…

Thanks.
14 Mar, 2014, plamzi wrote in the 5th comment:
Votes: 0
Lyanic said:
Thanks.


I'm just passing on good ideas I stole from smarter folks. I think that while some people like our quix may feel it violates the purity of it all, the fact is that not everything deserves to be a column or a table.

This approach is especially handy if you're writing a framework / codebase allowing for extensibility by others. In particular, I stole the approach from the Joomla CMS, which uses it to provide storage for arbitrary structures and parameters that third-party plugins may need to operate.

The benefits I think are clear, and seriously outweigh the fact that if you want to query for the character wearing the most pink items, you'll have to use a wildcard in your query, and it may take double the number of milliseconds.
14 Mar, 2014, Nich wrote in the 6th comment:
Votes: 0
If you find that you're storing more JSON strings than relational data, you might want to look into a nosql db like mongodb.

The benefit is you can essentially serialize your objects as is, and retrieve them as is without all of the messy joining.

Quixadhal's caveats still apply. You will generally design your model in MongoDB to support your main use cases up front, and after that adding unexpected uses will be difficult. This is totally a situation where YAGNI applies, though. You presumably know what your persistence layer needs to store and how you need to be able to retrieve it, so you can save time by designing your model for what you will actually need it for and not for every case you could possibly think of in the future.

Disclaimer, though, I haven't actually tried mongodb, it's just very popular right now. I can't speak for its quality, or its quality compared to other nosql solutions.

plamzi said:
In practice, storing MUD world data in a "table-liberal" kind of way can result in a whole lot of tables, and hence a lot of joins, and hence the need to change the schema every time you add or modify a property.


This kind of confuses me. I can see the "normalization -> lots of tables -> lots of joins" argument, but I'm not sure why changing your schema becomes unruly? Your schema must change, of course, but if the data is normalized, then the only part of the schema which needs to change is the part for the property you're adding.
14 Mar, 2014, plamzi wrote in the 7th comment:
Votes: 0
Nich said:
plamzi said:
In practice, storing MUD world data in a "table-liberal" kind of way can result in a whole lot of tables, and hence a lot of joins, and hence the need to change the schema every time you add or modify a property.


This kind of confuses me. I can see the "normalization -> lots of tables -> lots of joins" argument, but I'm not sure why changing your schema becomes unruly? Your schema must change, of course, but if the data is normalized, then the only part of the schema which needs to change is the part for the property you're adding.


Not sure what part is confusing. If you have an Items table, and you're stuffing all the less important properties your items may or may not have inside a single stringified TEXT field, then you can add and alter properties at will inside that field without having to modify your schema, or any of your read/write logic. The maintenance savings will quickly add up over time.

Let's say you have an "attr" field for your Items, and the attributes of a "pebble" look like this:

{ "material": "stone", "magical": 0, "procs": ["skips_over_water", "does_nothing_useful"] }

At any point, you can decide to add a "shape" attribute that can take up any range of values that only your code needs to know about. Your schema and read/write logic would be agnostic to the fact that a pebble may now have more attributes:

{ "material": "stone", "magical": 0, "procs": ["skips_over_water", "does_nothing_useful"], "shape": "roundish" }

And if next you want to be able to assign more than one material, and more than one shape, all you need to do is modify your in-memory handling. The storage engine will not care that you'll now be storing:

{ "material": ["granite", "silica"], "magical": 0, "procs": ["skips_over_water", "does_nothing_useful"], "shape": ["roundish", "elongated"] }

To me, it just seems wrong to see ItemMaterials and ItemShapes tables sitting next to your Users table and creating all that extra work for no practical reason.
14 Mar, 2014, Nich wrote in the 8th comment:
Votes: 0
I think that you only pay that penalty if you're trying to load your items as a whole block, instead of each piece as you need it. In either case you'd have to change your schema, but that's natural, since the shape of your data has changed. The difference is that, if you're loading whole objects at a time your object loading code has to change to match the schema. Whereas, if you're only loading the data that each part of the program cares about in that part of the program, then the only object loading code that has to change is the loading code concerned with the property you're adding.

Forgive me if I'm misunderstanding, but I think that the issue is related to the classic impedance mismatch between relational and object oriented data. If your goal is to store and retrieve whole objects then it might be more worthwhile to use a storage solution that does exactly that.

But my misgivings are mostly because searching a json string for a property I'm looking for in the DB terrifies me, in case the property name happens to show up in a description somewhere.

Edit: Rather than making assumptions, what are you using the relational aspect for, if I may ask?
14 Mar, 2014, plamzi wrote in the 9th comment:
Votes: 0
Nich said:
Edit: Rather than making assumptions, what are you using the relational aspect for, if I may ask?


The biggest assumption you, and I think quix also, are making is that using a relational database to store some stringified objects negates its purpose. Why do you feel that way? Why, if shape and material are trivial attributes only an item has, would I over-engineer things to be able to load them separately or query them more easily? Why shouldn't I use a database that can store both relational and object-oriented data to do both? Certainly you're not advocating to have two databases where one would do, just because each one is slightly better at the thing it was specially optimized for?
14 Mar, 2014, Davenge wrote in the 10th comment:
Votes: 0
plamzi said:
Nich said:
Edit: Rather than making assumptions, what are you using the relational aspect for, if I may ask?


The biggest assumption you, and I think quix also, are making is that using a relational database to store some stringified objects negates its purpose. Why do you feel that way? Why, if shape and material are trivial attributes only an item has, would I over-engineer things to be able to load them separately or query them more easily? Why shouldn't I use a database that can store both relational and object-oriented data to do both? Certainly you're not advocating to have two databases where one would do, just because each one is slightly better at the thing it was specially optimized for?


Depends on your goals. If your goal is to have fun writing something as optimized as possible…

Either way, I think I might try the MongoDB stuff, looks cool. Though, I should learn SQL for my resume but whatever!
15 Mar, 2014, quixadhal wrote in the 11th comment:
Votes: 0
Planzi, using your example, what would I do if I wanted to generate a report showing how many items were made of stone vs. how many items were made of wood? If you used a normal SQL schema, your items would have a "materials" column (either a single value, or an id referencing a junction table), and I would generate that information with a simple query. If your items push all that into a single string, now I can't even use SQL in any useful way. I'm forced to retrieve every record in the database and unpack them via code.

If you're doing that, what's the point of using a database at all? You could just as easily stuff that data into a simple key/value database like dbm, or even a flat file with an index.

So yes, I am making the assumption that people use databases because they want to be able to query the data, rather than just using them as indexed storage. If all you want to a bucket to push things into and pull them back out, a relational database is a waste of your time and your computer's resources.
15 Mar, 2014, plamzi wrote in the 12th comment:
Votes: 0
Davenge said:
Depends on your goals. If your goal is to have fun writing something as optimized as possible…

Either way, I think I might try the MongoDB stuff, looks cool. Though, I should learn SQL for my resume but whatever!


This particular kind of optimized is going to quickly take you down the road of "impossible to maintain". But if your goal is to learn as many kinds of databases as possible, you should hook them both up, read up on what each one is best at, and have at it. Of course, you may end up storing much of the same data in two places, which won't be optimized in anyone's book. And maybe you'll need to demote the SQL database to a non-authoritative position, or you'll be writing code to keep everything in sync, which again is not optimized by any definition.

quixadhal said:
Planzi, using your example, what would I do if I wanted to generate a report showing how many items were made of stone vs. how many items were made of wood?


'%stone%' , '%wood%'.

That's why I recommended a transparent string instead of a blob. It still handles most ad hoc queries pretty easily. Only joins would be difficult, albeit very doable using temporary tables. But if you want to join on something, then chances are that something is important (as in it describes a more complex *relationship*) and should be taken out of the stringified field.

quixadhal said:
If you're doing that, what's the point of using a database at all?


Again, let me see how simply I can put it. I use a MySQL database. I use its relational/query abilities to provide many features and stats. I use its standalone nature to feed data to other interfaces, e. g. a website and a web-based OLC. I use its concurrency handling, data integrity mechanisms, and even its MySQL-specific features like stored routines and scheduled events. Now, I also want to be able to store a bunch of less important stuff that is tied strongly to certain objects without having to spawn dozens more tables. I make practical decisions that deliver everything I want without imposing huge maintenance penalties.

This is what I do. This is how thousands of applications out there do it. If you think it's fundamentally wrong, then you're free not to do it. But to me, this feels like yet another essentialist debate. If you have a great draft horse pulling your heavily loaded cart, and you see a race horse zip by, feel free to shoot your draft horse. I, on the other hand, will feel free to use every feature that my relational database can provide, including tons of non-relational features. That's because when I read "relational" on the label, I don't see a big warning sign saying that the world will end if I do anything non-relational with it.
15 Mar, 2014, Nich wrote in the 13th comment:
Votes: 0
plamzi, I think I made the mistake of thinking that you were taking this idea too far, but looking back at the thread your answer was a lot more nuanced then I remember. I understand the appeal to practicality. I guess that I was just imagining the pathological case where rows weren't much more than an id and a json string. The further you get from your solution to that extreme, the more it makes sense to just switch to something like mongodb, where you have exactly that (id+json string) but have much richer support for working with that kind of data. But as long as you're keeping it sane (as you say, you can always "upgrade" json properties to a table or column if the need arises) than it's fine. Programming, beyond the most basic levels, is about tradeoffs.
15 Mar, 2014, Pymeus wrote in the 14th comment:
Votes: 0
plamzi said:
quixadhal said:
Planzi, using your example, what would I do if I wanted to generate a report showing how many items were made of stone vs. how many items were made of wood?

'%stone%' , '%wood%'.

Only works if you're careful to avoid similar strings across all "jsonized" fields in a given column. Also not very efficient, for situations where that matters. (But presumably "situations where that matters" are also ones where you'd promote that data out of JSON strings)

plamzi said:
That's why I recommended a transparent string instead of a blob. It still handles most ad hoc queries pretty easily. Only joins would be difficult, albeit very doable using temporary tables. But if you want to join on something, then chances are that something is important (as in it describes a more complex *relationship*) and should be taken out of the stringified field.

Just thought I'd add to the debate that recent versions of PostgreSQL (>= 9.2, latest is 9.3.x) have a new JSON type. Elements stored in a JSON column can be queried, indexed, etc like a normal column. Caveat: I've never used it, I just know some of the claimed functionality.
15 Mar, 2014, quixadhal wrote in the 15th comment:
Votes: 0
Thanks for your answer Planzi. I guess doing it selectively makes some sense, although my database background both makes me cringe at the idea and also makes me think it doesn't really save much complexity. One of the things I tend to do frequently is use relational integrity checks to avoid having to replicate the same thing in code.

For example, in you had an object material type column, instead of storing the object material type as a string, I'd generally store it as an integer that references a material_types table. Thus, if you attempt to insert or update an object with an invalid material ID, the database slaps you with a referential integrity error. That saves you from having to write the same check as a big case statement on the code side of things. I consider that to be worth the effort, because in two years when you add a new material type to the database, will you remember to change it in code? Especially if you have several bits of code that would need to be updated?

Doing it that way does make ad-hoc queries very ugly and long to type… but ones I find myself doing more than occasionally, I just end up making as a view anyways.

Pymeus, thanks for the heads up on the PostgreSQL feature. I haven't upgraded past 9.1 yet (I usually let debian manage it unless I have a good reason not to), but I can see that working reasonably well as long as the columns in question have full text indexing enabled. :)
15 Mar, 2014, Lyanic wrote in the 16th comment:
Votes: 0
plamzi said:
If you have an Items table, and you're stuffing all the less important properties your items may or may not have inside a single stringified TEXT field, then you can add and alter properties at will inside that field without having to modify your schema, or any of your read/write logic. The maintenance savings will quickly add up over time.

This is almost precisely the use case I had. I used a templated hashmap structure on objects (items, characters, etc) that lets me add new variables (properties) on-the-fly. Because of this, there are a large number of properties and most of them apply to < 1% of objects. Not only did it not make sense to have separate db columns for each from an efficiency standpoint, it was actually detrimental from a usability standpoint (forcing frequent schema changes took away the dynamical and easy to use nature that was intended). I ended up deciding on storing them as blobs, but that made queries difficult. Stringifying to JSON solves the issue - whenever I get back to working on it, that is.
18 Mar, 2014, Davenge wrote in the 17th comment:
Votes: 0
How much should I be holding in ram?
18 Mar, 2014, quixadhal wrote in the 18th comment:
Votes: 0
Davenge said:
How much should I be holding in ram?


You'll have to provide a LOT more information for any answer to be useful.

How paranoid about crashes are you? Are you using a good RDMBS that already manages its own ram buffering and transaction system? Or are you using something simple that likely just hits the disk on every access? Are you running under an OS that uses spare RAM for disk buffering, or not? How aggressively does your OS flush disk buffers to avoid swapping? How much RAM do you have, and how does that compare to the size of your data set? Do you do complex queries frequently, or is your usual use pattern just direct loading and storing of rows?

I would submit that most MUD databases just aren't very big, and could probably be entirely contained in RAM if the machine isn't busy doing other things… if that's the case, and you're using a modern OS (which buffers disk access automatically) and a modern RDBMS (which also will try to keep frequently or recently used tables in RAM), *YOU* probably don't need to "hold" anything in RAM that you don't want to.
18 Mar, 2014, Davenge wrote in the 19th comment:
Votes: 0
I will be hosting on a dedicated VPS Ubuntu 12.04. I've been using mysql for development, whatever the recent version is.
18 Mar, 2014, plamzi wrote in the 20th comment:
Votes: 0
You'll most likely need to read up on optimizing MySQL for your hardware and storage engine of choice.

After that, the next big question is whether your queries to the database are non-blocking. If they're non-blocking, that gives you the opportunity to be very conservative in what the game server itself holds in-memory. If they're blocking, then most likely you'll have to exercise judgment on individual basis, because some of your queries are bound to take a noticeable amount of time.
0.0/22