31 Aug, 2007, ethaniel wrote in the 21st comment:
Votes: 0
A little late, but was interesting reading, so figured I'd post anyway.

Both worst and average case might be a little bit deceiving here, I think. . . if size becomes a problem, there are many ways that you could take that 500 MB worst case and turn it into something much, much smaller (especially considering how well text tends to compress). Something like huffman coding could take that 20 byte average and bring it much closer to 5 or 6. The dictionary would be pretty small, too, assuming you did character-based encoding only (31-ish possibilities: a-z and '\n', '\0', '\ ', etc).

A simple solution (that I don't think I've seen so far; if I missed it, I apologize) would be to just cap the number of players a given player can remember. Assuming every player was limited to knowing 20 other players. . .

500 players * 20 friends / player = 20000 bytes * 20 bytes / name = 400000 bytes ~= 400kbytes

Maybe base the cap on the player's level or some stat; a person with higher WIS or eq. is less likely to forget somebody, and so they can remember more people at once. Also, slowly randomizing the names of people a player remembers (i.e. they see 'Sue' as 'Sun' or something) as the names become older in their memory might be entertaining. But I digress.

Getting back to math-ish stuff:

500 players * 20 friends / player = 20000 bytes * 4 bytes / vnum = 80000 bytes ~= 80kbytes

So taking a player name and replacing it with a 4-byte vnum will reduce usage by a factor of 5 on average, space-wise. That's in addition to the time using a numerical value as an index into an array would save as far as performing lookups goes.

Unrelatedly, could base player vnums off of a single static variable that gets saved to a file between boots and on crashes. Increment by one every time a new player is created. That eliminates the problem of having multiple players with the same vnum (assuming you lock the variable when in use so that you don't have the whole two-simultaneous-connections-get-the-same-number business).

Just random thoughts.

–ethaniel
31 Aug, 2007, Omega wrote in the 22nd comment:
Votes: 0
Zeno, in the 11 years i've been running my mud, not once has it been an issue or created a duplicate. and if push came to shove, i'd just change the second to milisecond, and that would most definatly stop the issue.

plus it also assigns the random letter, which also helps ensure that it won't have duplicates. but in anycase, its all good :) works for me, and thats good enough :) (besides, having more then 1 player connect at the exact, same, time to create wouldn't happen due to how the incomming connections work, my account system, and the player-creation system) it keeps track of the last stamp and ensures that if the stamp happens at the same time, it add's a +1, offsetting it.

anywho, the system works nicely.
31 Aug, 2007, Dorian wrote in the 23rd comment:
Votes: 0
Conner said:
Sorry, Dorian, it looks like a bit of a late response with all that's been posted since, but to answer your question, yes, it makes sense. I still think we're working with the same formula but we're using different numbers for our variables to try to get an average case instead of a worst case.


I think we're just working at different angles. I work a lot in worse-case for disk space, since if I can fit on everything on disk I usually don't have a problem what the actual space is. You're looking for the average case, which is probably more useful for most MUDs.


ethaniel said:
A simple solution (that I don't think I've seen so far; if I missed it, I apologize) would be to just cap the number of players a given player can remember.


I see your numbers, and I understand what you're getting at. However, the OP wasn't looking for a solution, but a way to tell if he would run out of disk space with his current system. It's hard to use your formula unless the OP implements that system first.
31 Aug, 2007, kiasyn wrote in the 24th comment:
Votes: 0
worst case scenario you pay $1 for another gb of disk :P
01 Sep, 2007, Zeno wrote in the 25th comment:
Votes: 0
Wish I could do that. :P
01 Sep, 2007, David Haley wrote in the 26th comment:
Votes: 0
Your math makes sense to me, Dorian. Although I'm not sure why you need to talk about graph theory, vertices, edges and loops: everybody knows n-1 people, and you have n people, therefore the total number of "knows" is n*(n-1). :smile:

That said, if you want to get into graphs and whatnot, it might help to store (once) fully-connected sub-graphs of the players. This would be useful for largish sub-graphs. For instance, if player1 through player100 all know each other, instead of having 9900 edges, you could store a single list 100-long saying they all know each other. I'd have to think about it more to work out whether this is really something that would help, and when it would start becoming helpful.

Also, there is no need for both players in a knowledge relationship to store edges to each other. The edges aren't directed, after all. You could store once, in some server file, a list of all pairs of people knowing each other. Now you get C(N, 2) edges, times the average name length, which is half the result you had previously. Not huge, but something at least.
01 Sep, 2007, kiasyn wrote in the 27th comment:
Votes: 0
The pro way to do this would be to generate a unique description for every character, which the players actually have to remember ;)
02 Sep, 2007, Dorian wrote in the 28th comment:
Votes: 0
DavidHaley said:
Your math makes sense to me, Dorian. Although I'm not sure why you need to talk about graph theory, vertices, edges and loops: everybody knows n-1 people, and you have n people, therefore the total number of "knows" is n*(n-1).


It was an illustrative example to show how I derived my numbers. Different analysis could have been made easily, but my explanation could have been drawn by hand for visual verification.

Quote
That said, if you want to get into graphs and whatnot, it might help to store (once) fully-connected sub-graphs of the players. This would be useful for largish sub-graphs. For instance, if player1 through player100 all know each other, instead of having 9900 edges, you could store a single list 100-long saying they all know each other. I'd have to think about it more to work out whether this is really something that would help, and when it would start becoming helpful.


Actually, this would be far worse. Storing the information would be a snap. However, finding these "cliques" (fully-connected sub-graphs) is a NP-complete problem. Stay far, far away from this idea.

Quote
Also, there is no need for both players in a knowledge relationship to store edges to each other. The edges aren't directed, after all. You could store once, in some server file, a list of all pairs of people knowing each other. Now you get C(N, 2) edges, times the average name length, which is half the result you had previously. Not huge, but something at least.


I know this. I was answering the OPs question about how much he had to worry about disk space by giving him a formula he can use. I didn't propose an alternative implementation since that never was the question.
02 Sep, 2007, David Haley wrote in the 29th comment:
Votes: 0
Dorian said:
It was an illustrative example to show how I derived my numbers. Different analysis could have been made easily, but my explanation could have been drawn by hand for visual verification.

Hey, I was just saying. :smile:

Dorian said:
Actually, this would be far worse. Storing the information would be a snap. However, finding these "cliques" (fully-connected sub-graphs) is a NP-complete problem. Stay far, far away from this idea.

I realize the problem is computationally difficult, but bear with me for a second before bandying about scary things like NP-completeness. :smile: For instance, it is worth considering that you don't have to do it terribly often. If disk space is really at a premium here, and you thought it likely that there would be dense circles of players who all know each other (which is not an unreasonable assumption) then it would be worth looking at the players' relations from time to time and find the cliques. You don't have to repeat the analysis every single time a new connection is made; once a day or so would be more than sufficient.

Furthermore, you don't have to be exhaustive in listing the cliques. The savings gained by finding even small cliques are considerable. Approximation is ok here: All that matters is that you are sound as you find cliques, not complete.

Dorian said:
I know this. I was answering the OPs question about how much he had to worry about disk space by giving him a formula he can use. I didn't propose an alternative implementation since that never was the question.

I was just bringing it up because your analysis of this method didn't mention it. (And yes, I know, it won't change the asymptotic behavior.) It seems counter-productive to not suggest improvements, even such meager ones as halving storage space, just because nobody asked for any.
05 Sep, 2007, Dorian wrote in the 30th comment:
Votes: 0
Sorry for the late reply. I've been busy

DavidHaley said:
Furthermore, you don't have to be exhaustive in listing the cliques. The savings gained by finding even small cliques are considerable. Approximation is ok here: All that matters is that you are sound as you find cliques, not complete.


If you say so. I really don't know of any good approximation algorithms that guess at multiple cliques in a graph; rather, just ones that find the maximal clique.

Quote
Dorian said:
I know this. I was answering the OPs question about how much he had to worry about disk space by giving him a formula he can use. I didn't propose an alternative implementation since that never was the question.

I was just bringing it up because your analysis of this method didn't mention it. (And yes, I know, it won't change the asymptotic behavior.) It seems counter-productive to not suggest improvements, even such meager ones as halving storage space, just because nobody asked for any.


I tend to think it is counter-productive not to answer the questions that the OP asks. Even if I had suggested another implementation, he still would of had to live with his current implementation for some time. Moreover, I didn't even know if he was looking for a better way to implement. For all I really knew by the details he gave, his implementation might of been the best for his goals. As far as the information that I was given went, it just was not necessary at the time.
05 Sep, 2007, David Haley wrote in the 31st comment:
Votes: 0
Dorian said:
I really don't know of any good approximation algorithms that guess at multiple cliques in a graph; rather, just ones that find the maximal clique.

Well, it depends on what you mean by 'good'. But nonetheless, even without any approximations whatsoever, a once-a-day run of an NP-complete algorithm isn't the end of the world by any means.

Dorian said:
I tend to think it is counter-productive not to answer the questions that the OP asks.

Well, of course. :lol:

Dorian said:
Moreover, I didn't even know if he was looking for a better way to implement.

I'm of the school of thought that it is almost always better to suggest a better way, if there is one and it's easy. Especially if it's such a simple fix that cuts storage needs in two. But hey, to each their own. :smile: (I felt it especially relevant to discuss undirected edges given that you chose to explain it using graph theory.)
06 Sep, 2007, ethaniel wrote in the 32nd comment:
Votes: 0
Quote
But nonetheless, even without any approximations whatsoever, a once-a-day run of an NP-complete algorithm isn't the end of the world by any means.


That's assuming that you can completely run the algorithm in a day. . .
06 Sep, 2007, David Haley wrote in the 33rd comment:
Votes: 0
Well, sure. It depends on how many players you have and so forth. But if we return to approximation for a moment, even a Wikipedia search reveals that finding the largest clique is a linear process (linear in the number of edges, not nodes). And as a by-product of this search, you find many smaller cliques. And suddenly, by approximating cliques instead of finding the proper largest one, we are faced with a mere linear run-time: and suddenly, the problem isn't nearly as scary. And yet our savings can be great. Even a clique of size 10 means that we list 10 people instead of, for each person, listing 9, meaning that instead of 90 storage points we only need 10.

Never underestimate the power of approximation. :smile:
14 Sep, 2007, Kjwah wrote in the 34th comment:
Votes: 0
Zeno said:
I want players to be able to "remember" other players. I would assume the way to do this is to store a string in their pfile that contains names of players they have remembered.

But I'm afraid this could end up taking up too much disk space later on. Am I just being paranoid?


I would implement an ID system using some form of a unique interger. Then what I'd do is set it up along with the ID of the players set a last_seen variable that stores the date of when they were last seen. After this, I would check the character either with an event at logon or just when the loading of their character file occurs and search for the last_seen date and if it's expired by however long you want it to stay in a pfile remove them from the players file. This way, you save on space in the long run and it just seems like it would be the logical way to do this.

If that makes sense?

EDIT: When I brought up the last_seen part, I just meant use that in conjunction with the characters ID when you store the information..
17 Sep, 2007, David Haley wrote in the 35th comment:
Votes: 0
This scheme uses very similar amounts of data to the one that stores character names. In fact, the exact same math applies if you just assume the average character name length to be equal to the size of an integer. In other words you aren't really saving more than a few bytes per entry; yes, that will add up over time, but it's not going to help fix the combinatorial problem.
17 Sep, 2007, Justice wrote in the 36th comment:
Votes: 0
DavidHaley said:
In fact, the exact same math applies if you just assume the average character name length to be equal to the size of an integer.


Uh, that's quite an assumption. Lets assume that ints are 32 bit, and chars are 8 bit. Then basically you're assuming that each name is 4 characters.
So yeah, lets get into math, using the values found earlier in this thread… 500 players, 20 friends. This created 10k combinations (500x20)

500 * 20 * 4 = 40k (4 byte integer)
500 * 20 * 20 = 200k (1 byte chars, 20 len)
500 * 20 * 6 = 60k (1 byte chars, 6 len)

Of course, these being binary values… lets assume 500 players again and stored as text. That averages less than 3 chars per player.

500 * 20 * 3 = 30k (1 byte chars, 3 len)

So lets "assume" you have 10k players, where it would require 5 bytes to store the data as text, and decide that you need to save space. You can then store the data as binary and prevent it from growing past 4 bytes per entry.

Of course, storing the data is only a part of the problem. And while it potentially can use alot of disk space, you should also consider the cost of searching your space. In this case, integers are much easier to compare than strings. Saving on the repeated overhead that would occur during actual use.
17 Sep, 2007, David Haley wrote in the 37th comment:
Votes: 0
Obviously it's quite an assumption; I was simply showing that it doesn't change the asymptotic behavior whatsoever. So yes, as I said, clearly you will save some bytes per entry. What a way to nitpick and miss the point. :smile:

I don't really buy the comparison argument. Obviously you will save some time if you compare integers instead of strings. But the problem in searching your data really has nothing to do with the individual comparison costs, but rather how you index your data to begin with. That's not exactly a hard problem, clearly, but the comparison cost is more or less noise in the grand scheme of things.
20.0/37