25 Dec, 2012, Bojack wrote in the 1st comment:
Votes: 0
Hey guys, im trying to get help with either a coding idea or sql storing idea. I was asking runter the other day and still not 100 percent on how to go about this. My problem is right now in C# under a windows server we have stat storing to the database with the format in the pic below. What im trying to figure out is if someone wants to look at last years stats, how would I go about recording this because currently the only timestamps is not even in db.stats, its in the players alias which is created and last accessed. The commands we have to look up stats are yearly/daily/weekly/monthly. Runter was telling me how to do it but im still not sure how to get it work correctly, the way I want it is if someone typed in a year, it would only give that year stats, not from that year till now. Can anyone give me any ideas on how to do this? Should I save stats each year to another file or what?

25 Dec, 2012, quixadhal wrote in the 2nd comment:
Votes: 0
The screenshot you have here is basically a spreadsheet. Perhaps there are other tables and additional columns, but since it's only a screenshot, I have no way to guess.

IF this is a table, and there's some player ID field off the screen, is each row a cumulative stat line which gets updated when the player does something? If that's the case, and the timestamp is stored in a different table (for some reason), you could get the date by joining the tables on the player ID field.

To be honest, it would be far simpler to add a timestamp column to the table and instead of updating the rows, just do an insert. To get the current stats, you'd just select the row with the max timestamp, and then you could see historical data as well.

But…. for all I know you're using files and pulling all this from log data to shove into access.
25 Dec, 2012, arendjr wrote in the 3rd comment:
Votes: 0
Yes, my engine is also doing what quixadhal is suggesting, but I'm not using a database (either?).

I simply have a bunch of log files (one for each player, tracking the commands they type, one for tracking room visits, one for tracking player deaths, …). Every line in those files has a timestamp. To keep the files from growing too large, and to make searching through the history a bit easier, the files are in separate directories for each day. So I got a directory 20121224, a directory 20121225, and so on.

The downside of using plain text files is you cannot use SQL to query for the stats you need. The upside is that it's a lot better for performance. And you can aggregate the data any way you like afterwards.

You can checkout the code I use here: https://github.com/arendjr/PlainText/blo... Just look for the DiskUtil::appendToLogFile() method.
25 Dec, 2012, quixadhal wrote in the 4th comment:
Votes: 0
I disagree with flat files being better for performance. If that's the case, either your SQL server isn't working properly, or your schema isn't properly normalized and indexed.

However, most MUD's weren't designed with a live database in mind, and for about 90% of what most admins want to use SQL to do, you can simply pull your flat text data and push it into a database for analysis, living with the fact that you may be a few hours or days behind "real time".
25 Dec, 2012, arendjr wrote in the 5th comment:
Votes: 0
Well, let me elaborate: Logging is faster with files. Querying the logs is indeed often faster with a database. But given that the actual logging is happening constantly, and the querying by the admins is done only once in a while, I'd rather optimize the former.
25 Dec, 2012, plamzi wrote in the 6th comment:
Votes: 0
If the stat table is constantly being updated, having a timestamp in that table is not going to help.

Since this would only need to happen once a year, why not just take a snapshot of the stat table as it is at the end of the year?

In SQL:

CREATE TABLE 2012Stats like Stats;
INSERT INTO 2012Stats SELECT * from Stats;


If this is such an infrequent event, there's no point automating it. Just run in manually.
25 Dec, 2012, Vigud wrote in the 7th comment:
Votes: 0
It wouldn't provide ability to select daily stats that, I think, OP wants to have. Besides, it's not clear if OP uses SQL.
25 Dec, 2012, plamzi wrote in the 8th comment:
Votes: 0
Vigud said:
It wouldn't provide ability to select daily stats that, I think, OP wants to have. Besides, it's not clear if OP uses SQL.


Although the OP has not shared enough information, it is pretty clear that if they have a logic to do this for data to-date, so they should be able to copy off all the tables that these commands reference at the end of each year, and then point those commands to a given year when someone is browsing the archives. The principle is the same regardless of the details.
26 Dec, 2012, Tyche wrote in the 9th comment:
Votes: 0
Either you add date-time information to the data in the file, or you dump the data into files at regular intervals that has the date-time as part of their name.
26 Dec, 2012, quixadhal wrote in the 10th comment:
Votes: 0
arendjr said:
Well, let me elaborate: Logging is faster with files. Querying the logs is indeed often faster with a database. But given that the actual logging is happening constantly, and the querying by the admins is done only once in a while, I'd rather optimize the former.


I don't know what your SQL experience has been like, but logging to a database is just as fast as writing to a file, and can actually be faster since the transaction log is often held in RAM until a commit or rollback. It *CAN* be slower, if you have a poorly indexed table, or if you have several constraints that have to be checked, but a log table generally doesn't need such things.

Most database systems don't rebuild the indexes until the transaction log is actually merged into the table dataspace proper, which is done by the database server on its own terms.

plamzi said:
If the stat table is constantly being updated, having a timestamp in that table is not going to help.


That's why I suggested changing it from doing an update to doing an insert for each chunk of new data. That way you can get the most recent stats by the largest timestamp, and can see earlier snapshots by selecting all with the same userid.
26 Dec, 2012, Rarva.Riendf wrote in the 11th comment:
Votes: 0
Quote
The downside of using plain text files is you cannot use SQL to query for the stats you need. The upside is that it's a lot better for performance.

Hard drives are the bottlenecks of computer (a good SSD makes whatever piece of junk you have a dream to use), so saying writing to files is a lot better performance than to use a DB that will probably do most of its stuff in RAM is definitely not possible if you store the same info.
26 Dec, 2012, Runter wrote in the 12th comment:
Votes: 0
It depends on the database technology in question and the hardware you're using, but databases are good candidates for high throughput logging. If your logging is low throughput it may not matter to you. Something like mongodb is a great option for logging. If you're already talking about using file storage in addition to a database then it shouldn't be an issue to have two databases.
26 Dec, 2012, Bojack wrote in the 13th comment:
Votes: 0
Just to clarify, this table is linked to the players table. The tables are linked like this … account->alias->player->stats using the id as the primary key so I can grab any info from any of the tables, each id is an actual player and each table line is read into a memory structure once that id logs on. The screenshot itself is the actual sql db table using microsoft sql server. As for how the stats table is updated, its updated in increments of 5 minutes using db.SubmitChanges().

Tyche - If I add a date timestamp, do you mean a timestamp column or something else? A column wouldnt work in this situation I dont think unless i use insert like quixadhal was saying but then that would just make the table larger then what it already is cuz right now there is 19108 unique aliases in that table.

arendjr - Your code gave me an idea, what im thinking now is making new data tables in the main sql db that has daily, weekly, monthly, yearly structures and then adding code that submits the changes to the db based on the day, so at the end of the day/week/month/year itll take the daily stats, add them together then submit it to these tables maybe? Making new files for each alias would be too much to try to search through when someone used that command. It would actually be slower then just finding the primary key within the db table and grabbing the info from there.
26 Dec, 2012, arendjr wrote in the 14th comment:
Votes: 0
Quote
I don't know what your SQL experience has been like, but logging to a database is just as fast as writing to a file, and can actually be faster since the transaction log is often held in RAM until a commit or rollback. It *CAN* be slower, if you have a poorly indexed table, or if you have several constraints that have to be checked, but a log table generally doesn't need such things.


Quote
Hard drives are the bottlenecks of computer (a good SSD makes whatever piece of junk you have a dream to use), so saying writing to files is a lot better performance than to use a DB that will probably do most of its stuff in RAM is definitely not possible if you store the same info.


Of course hard disks are the bottlenecks, but somehow you are both assuming that I am syncing the data immediately to disk. I'm not. Whenever I'm writing those files, the data ends up in RAM in OS buffers, which will sync them to disk at its leisure. Using a database will typically thrash your disk much more, as no transaction will complete until the data has been synced to disk. I'm writing files, with no indices, no SQL parsing, no query optimizations, no transactions, just a single stream of data.

Of course there a few ways around this using databases. Quixadhal seems to suggest wrapping the inserts into a big transaction, which seems like a rather odd workaround to me. If it's indeed the case, you're making your own code more complicated to work around the deficiencies of your databases, using it in ways it was not intended to work. And at some point you still need to commit the transaction, blocking your client application while the data is synced to disk.

MongoDB actually delays the writing to disk a bit to enable higher throughput, but it does this by violating the D(urability) in ACID, at least by default. Mind you, for logging I'm perfectly fine with that :)

Another way would be to use a pure memory table, but I want at least some sense of persistency, so I'm not assuming that.

I'll give a little anecdote of how we're handling statistics at my work. We're running a social network site with about 3300 servers. Now our load has decreased considerably now that Facebook has taken the lead from us. But at the peak about 600 of those servers were webservers, collectively serving about 20 million pageviews per hour. For every pageview there were typically dozens of statistics tracked. So we're talking about hundreds of millions of stats per hour. This translated to many millions per minute or some 50,000 per second. How did we handle these statistics you ask? Simple: The webservers just sent their stats messages using UNIX syslog to a single machine* which wrote all the logs to disk. At night a daemon would go over the collected logs and aggregate the results so we actually get some useful numbers out of them.

Now tell me, which database is going to handle 50,000 inserts per second? I'm not assuming you're running an Oracle or IBM database solution for your MUD, so assuming a single modern x86 server and an RDBMS you'd be lucky to get even a tenth of that number. I don't want to say databases are bad, but for certain stuff, including log tracking, they're simply overkill. If you're not designing for high load, that's fine. But just shoving everything into a database and then claiming they're the end-all-be-all for performance is just silly, and grossly uninformed.

(* The critical reader might see this is a single point of failure, but in this case it didn't matter. If the machine would go down, syslog would drop the messages, and worst case we would lose a short period of stats)

Quote
Your code gave me an idea, what im thinking now is making new data tables in the main sql db that has daily, weekly, monthly, yearly structures and then adding code that submits the changes to the db based on the day, so at the end of the day/week/month/year itll take the daily stats, add them together then submit it to these tables maybe? Making new files for each alias would be too much to try to search through when someone used that command. It would actually be slower then just finding the primary key within the db table and grabbing the info from there.


I guess that's a way to go, yes. But there will be some limitations to this approach. For example, can I get all data from 2011 or 2012? What about from March 2011 till March 2012? What if I want to get it from a 3 or 6 month interval? But of course you should just choose what fits your own requirements here :)
26 Dec, 2012, quixadhal wrote in the 15th comment:
Votes: 0
I can't speak for every database engine out there, but PostgreSQL doesn't block clients on a commit. I didn't mean wrap all your inserts into a single transaction, but when you perform something in a transaction, it gets written to a transaction log FIRST. The database then flushes committed transactions into their proper tablespaces, updating indexes as it goes. Your application doesn't see any slowdown from this unless your hardware isn't able to keep up with this background activity.

To the OP, I hope 20K entries isn't something you consider too large. That's a pretty small dataset for most database systems, and if you're having speed issues, it's probably not the fault of the DB, but rather that your hardware is overtaxed in some way.
26 Dec, 2012, Rarva.Riendf wrote in the 16th comment:
Votes: 0
Quote
Now tell me, which database is going to handle 50,000 inserts per second?


Provided you actually do the exact same thing on the same machine (that is writing a line and not parsing it)…all of them.

You want to compare two different thing, a line insert and a sql query that parse it do fill different columns. Sure using a db for just inserting a raw line is stupid. But still, DB can handle that just fine..
Now you want to parse your file log and use it like it was a database…not a good idea…

(and your network speed would probably bottleneck you way before bottlenecking the file system)

The main bottleneck of io transaction never been the databases, it always been the filesystem. Hence why there are huge ram disk and SSD use for heavy duty db now.
26 Dec, 2012, plamzi wrote in the 17th comment:
Votes: 0
Bojack said:
Just to clarify, this table is linked to the players table. The tables are linked like this … account->alias->player->stats using the id as the primary key… As for how the stats table is updated, its updated in increments of 5 minutes using db.SubmitChanges().

Tyche - If I add a date timestamp, do you mean a timestamp column or something else?


As far as I can tell, you stat table has stats broken down by player ID and zone. And also, you said the only timestamp is "last accessed" in the alias table. So how do the current commands come up with daily and weekly stats if the timestamp from alias is only "last accessed"? There has to be a missing piece in your description since a join on player id is not going to produce (accurate) daily or weekly stats.

However it is currently done, using what I suggested is the right approach for "archiving" data (and it is what we do). You can use your existing code with very few changes, just point it to a set of tables "frozen" at the end of 2012. It takes 2 min. to copy these tables off. If you want to optimize things a bit, put all archive tables in their own schema.

What quix suggested (inserting new records rather than updating) can create bloat very quickly if not done right. For one, it's possible that it will add a new record for each player for each zone every 5 min. And then you'll be keeping potentially very old records in a table that will most likely be queried for recent periods only (unless you add a smart procedure to "summarize" old records).
27 Dec, 2012, quixadhal wrote in the 18th comment:
Votes: 0
What Planzi says is true, although I don't consider a historical archive of data to be "bloat". If you're copying them off, you're presumably STILL storing them, which still takes up space. An SQL database doesn't waste time loading data you didn't ask for, so there's no penalty for keeping 20 million rows of data when you only access the last 2000, unless you don't index it, and always make unlimited queries.

select * from logs where log_date >= (now() - '1 week'::interval);

A database will simply use the index to locate the set of rows matching that criteria (this week's logs) and pull the data from those parts of the table files. Just like you'd hand-access your log_YYMMDD.txt file(s), but without you having to (possibly) open multiple files and sequentially comparing every date until you find one old enough to tell you to stop.

Sure, you could compress the "unused" data, but then it's offline, still taking up space, and… why are you worried about disk space anyways? Last time I checked, a 1TB drive was pretty cheap, and that's an awful lot of MUD logs.
27 Dec, 2012, Tyche wrote in the 19th comment:
Votes: 0
Bojack said:
Just to clarify, this table is linked to the players table. The tables are linked like this … account->alias->player->stats using the id as the primary key so I can grab any info from any of the tables, each id is an actual player and each table line is read into a memory structure once that id logs on. The screenshot itself is the actual sql db table using microsoft sql server. As for how the stats table is updated, its updated in increments of 5 minutes using db.SubmitChanges().

Tyche - If I add a date timestamp, do you mean a timestamp column or something else? A column wouldnt work in this situation I dont think unless i use insert like quixadhal was saying but then that would just make the table larger then what it already is cuz right now there is 19108 unique aliases in that table.


So it's a summary record that's updated in memory constantly and committed every 5 minutes.
That design doesn't lend itself to easy conversion into tracking daily stats.

One way is to instead attach to a player a current day stat record.

stat table
playerid
datestamp
stat1
stat2


You could also update that every 5 minutes, but make sure you also update it when the date changes at midnight and when the player logs out.
You change the datestamp and set the stats on the record to zero at midnight or when the player relogs in on a different day.

So the table will contain 1 record per day per player (that is only the days they played).

player date stat1 stat2
1 121226 55 928
1 121227 0 0
1 121230 45 2

It then becomes trivial to get daily, weekly, monthly, yearly, total reports per player.
30 Dec, 2012, Bojack wrote in the 20th comment:
Votes: 0
Quote
For example, can I get all data from 2011 or 2012? What about from March 2011 till March 2012? What if I want to get it from a 3 or 6 month interval?


Actually I can in this sense using:
DateTime now = DateTime.Now;
now = now.AddDays(-1);
var daily = (from dt in db.statsDailies
where dt.zone1 == zone._zone && dt.date >= now
orderby dt.assistPoints + dt.bonusPoints + dt.killPoints descending
select dt).Take(100);
Take(100) = Top 100 people of that time.
And yea it seems small to most db's out there but for our game its big to me.

Quote
So how do the current commands come up with daily and weekly stats if the timestamp from alias is only "last accessed"?

They dont, thats what im coding in now. The client itself has it ready to be used, but the server doesnt.
0.0/23