13 Oct, 2008, Runter wrote in the 21st comment:
Votes: 0
DavidHaley said:
In this case, how do you change the strings associated with an object instance? Is that something that you simply can't do in ROM?


I'm not sure at the exact process or of it's even possible with a stock rom. But, my point is all of these apocalyptic prophecies of doubling memory requires for big muds after revising the string management system are not based in reality. Even without the string management system, it can be easily done with less hassle to "share" memory by using pointers to the index– And already is! The string management system only manages the strings of index's. Not actual objects.
13 Oct, 2008, Runter wrote in the 22nd comment:
Votes: 0
Runter said:
DavidHaley said:
In this case, how do you change the strings associated with an object instance? Is that something that you simply can't do in ROM?


I'm not sure at the exact process or of it's even possible with a stock rom. But, my point is all of these apocalyptic prophecies of doubling memory requires for big muds after revising the string management system are not based in reality. Even without the string management system, it can be easily done with less hassle to "share" memory by using pointers to the index– And already is! The string management system only manages the strings of index's. Not actual objects.

I'm not sure at the exact process or of it's even possible with a stock rom. But, my point is all of these apocalyptic prophecies of doubling memory requires for big muds after revising the string management system are not based in reality. Even without the string management system, it can be easily done with less hassle to "share" memory by using pointers to the index– And already is! The string management system only manages the strings of index's. Not actual objects.[/q
[quote=DavidHaley]I think there is some confusion here…

I am arguing that not sharing strings across instances is wasteful. By sharing pointers with the prototype, strings are being shared. So that's good. But in that case, I wonder how you know when to free a string associated with an object, because you don't know if it belongs to the object or to its prototype.


EDIT: also, I should point out that "shared string" != "shared string system that ROM uses".[/quote] [/quote]

I don't think there's confusion. Earlier you said that we didn't see the benefits of rom's system because we didn't have some ridiculous number of objects loaded. But the fact is those objects don't use rom's system we are talking about scrapping.
13 Oct, 2008, David Haley wrote in the 23rd comment:
Votes: 0
Well, I agree that if ROM is using shared strings only across prototypes, that's pretty useless. Shared strings aren't meant to be used in situations where there aren't many common strings… It's a lot of headache for pretty minimal gain. (Of course, in C++ with a proper shared string object, the headache basically goes away. :shrug:)

That said, I think it is rather lacking to not be able to modify instances of objects. To do that, you need a proper string sharing system between instances, not just shared pointers.

My point is that you should not judge a shared string system based on the silly thing that ROM does now. Shared strings have their uses; sharing across prototypes is not really one of them.
13 Oct, 2008, David Haley wrote in the 24th comment:
Votes: 0
For the record, I hardly think that 10,000 objects is a "ridiculous" number of instances to have floating around.
13 Oct, 2008, Runter wrote in the 25th comment:
Votes: 0
I don't think there was ever a debate about sharing memory or not sharing memory.
The debate was the revision of rom's systems, and some people are on the side of keeping
rom's current "shared string system" which it is self described as. Here's a few facts for people that they may not know:

Rom shares strings of prototypes and actual objects, but he self described "shared string system" that uses the MAX_STRING
definitions, the hash system, etc. It *only* effects prototype definitions. There's a complete separate system for
objs based on prototype–But this isn't part of rom's "shared string system". The only time anything is put into the shared string
system is when fread_string is used. Never when just creating any instance based off of a prototype definition.

So when we say we hate the shared string system, we're not talking about scrapping all string management. We're talking about
getting rid of the clunker rom uses just to share index data strings–Which is really only talking about room descriptions that are
identical. Not thousands of objects, mobs, etc.

Clarification: The system currently in place uses the "shared string system" to determine if the pointer objects are pointing to should
be freed or is part of the index. It can be changed to a simple address comparison. But that in no way makes them part of the same system.
13 Oct, 2008, Runter wrote in the 26th comment:
Votes: 0
DavidHaley said:
For the record, I hardly think that 10,000 objects is a "ridiculous" number of instances to have floating around.


It would be a ridiculous number if we were talking about independently allocating all of their strings for objects that are the same.
Which nobody is.
13 Oct, 2008, David Haley wrote in the 27th comment:
Votes: 0
This confusion between the overloading of the term "shared", i.e. the concept of sharing strings vs. the shared string system, is exactly what I was talking about.

At this point I think it's pretty clear that we agree that sharing strings among prototypes is basically useless. The instance strings are shared with prototypes, which is obviously good for memory, but the way they are shared makes it impossible to change an instance's strings, which is IMO an unfortunate consequence.

Runter said:
It would be a ridiculous number if we were talking about independently allocating all of their strings for objects that are the same.
Which nobody is.

That's not exactly what you said earlier, hence my remark. :wink:
13 Oct, 2008, Runter wrote in the 28th comment:
Votes: 0
DavidHaley said:
This confusion between the overloading of the term "shared", i.e. the concept of sharing strings vs. the shared string system, is exactly what I was talking about.

At this point I think it's pretty clear that we agree that sharing strings among prototypes is basically useless. The instance strings are shared with prototypes, which is obviously good for memory, but the way they are shared makes it impossible to change an instance's strings, which is IMO an unfortunate consequence.

Runter said:
It would be a ridiculous number if we were talking about independently allocating all of their strings for objects that are the same.
Which nobody is.

That's not exactly what you said earlier, hence my remark. :wink:


Nobody has ever disagreed on sharing memory for strings. The point of disagreement comes on is scrapping what rom calls their shared string management a good thing. I think once people see exactly what their system is doing, they agree on removing it for something else

I would prefer using a C++ string management class, but a lot of people on the project are determined to keep it C only (even though we're compiling with g++). And they probably have valid reason for wanting to do that.

So odds are we're going to end up reinventing the wheel–The most over-used term of Mudbytes 2008– since we're also not using any non-standard libraries for the project.
13 Oct, 2008, David Haley wrote in the 29th comment:
Votes: 0
I think that we are in vehement agreement at this point. :wink:

If people are opposed to C++ per se, I would like to know why. Looking at some of the reasons given in the round-table logs,

- We'd need a C++ compiler.
Well, g++ is a C++ compiler. :wink:

- It will break snippets.
Not necessarily. Besides, with all the other stuff you're doing, you will be breaking snippets eventually anyhow.

- "Feel" of ROM.
If you only use C++ in a few well-chosen points, you will hardly be affecting the bulk of the codebase.


I guess I'm getting the impression that people are against it just because it's 'different', which is really very unfortunate IMHO. You don't need to convert everything to classes to use C++. The benefits are very, very considerable. This isn't my show, obviously, but I would very strongly recommend not tabling the C++ issue. Again, it doesn't mean changing everything to the point where it's unrecognizable. In fact, you hardly need to change anything at all…
13 Oct, 2008, quixadhal wrote in the 30th comment:
Votes: 0
MacGregor said:
Heh heh, who's "we"?


C'mon… you hate it too. You just won't let go… *grin*

MacGregor said:
The result is that, in stock ROM, 805,153 bytes are in duplicated strings. This is out of 1,314,667 total bytes. To put it another way, memory usage for this stuff would be 61% higher without this mechanism.

Just for kicks I added the same code to my own mud, a Rom deriv, which has significantly more areas than stock ROM. On my mud, I have 6,751,221 bytes in shared string space and I'm saving 3,079,011 bytes.


I'm rather curious of the bucket sizes of these duplicated strings. All strings require a minimum of 4 bytes of RAM (8 bytes if you're on a 64-bit platform!), for the pointer variable. So, could you add a line in there to exclude any strings whose length is less than 9 bytes? Those won't actually benefit from being shared at all, and I'm suspecting there are lots of things like "north", "orc", "sword" and such that are muddying up the results.

That might be a more useful picture.

Oh, and yes… the "string sharing system" is seperate from the "prototype sharing system". The prototype system is the one that prevents you from using the old Diku strings command to make a custom sword of uberness to reward your player after a fun RP session by giving them a sword whose description is customized to them and their accomplishment. That's a seperate issue.

This is the one that says prototype 3001 has the string "orc", which is really a pointer to the same "orc" that's used in prototype 3012. So if you manage to corrupt one "orc", you've corrupted every single orc in the game, even different vnums.

It's kindof like taking a row of milk jugs at the store, and putting a hole in the sides of every one, and then sticking a tube through them. As long as everything works right, you can move all the milk jugs with one lift. However, if one starts leaking, all the milk from the row will end up on the floor. :)
13 Oct, 2008, Runter wrote in the 31st comment:
Votes: 0
Also something else I would ask is what is your MAX_STRING set to? The rom system always uses the same amount of ram and shuts you down if you ever go over that MAX_STRING value. So even though you might have x bytes in duplicated space that doesn't tell us how many bytes you are allocating to be used regardless of how much are being saved, ever.
13 Oct, 2008, MacGregor wrote in the 32nd comment:
Votes: 0
Okay, hopefully this will illustrate how Rom's string sharing works, and where the memory savings come from. Let's take a look in midgaard.are and a couple mobs therein. Note that I've removed the flags, position and other stuff not germane to this. I'm keeping only the fields which are strings, name.y, the name, short descr, long descr, and description. I've also included the race since that's handled as a string.

#3067
cityguard guard~
the cityguard~
A cityguard is here, guarding the gate.
~
A big, strong, helpful, trustworthy guard.
~
human~

#3068
cityguard guard~
the cityguard~
A cityguard is here, guarding the gate.
~
A big, strong, helpful, trustworthy guard.
~
human~

#3069
cityguard guard~
the cityguard~
A cityguard is here, guarding the mayor.
~
A big, strong, helpful, trustworthy guard.
~
human~


Note that all three mobs have the same name, short and description, 3067 and 3068 have the same long descr as well.

When the mud is loading the area files at boot time, the strings are read by fread_string() in db.c. It will read the strings for 3067 and, since they haven't appeared before, will allocate space in the shared mstring space and store them there. When it reads the name for 3068, it finds that it already has the string "cityguard guard" in string space, so instead of storing another copy of it it simply returns a pointer to the copy of the string which is already there, and we've just saved 16 bytes. We also now have two pointers to the same string "cityguard guard". The same thing will happen when it reads the short descr, long descr and descriptions for #3068. When it gets to #3069, fread_string will return pointers to the previously stored name, short descr and description, store the new long descr and return a pointer to that. So we now have twelve different pointers pointing to a total of five different strings, our strings take up a total of 157 bytes and we've saved ourselves another 189 bytes. These pointers, by the way, are in the index data structs, which are used as templates when a mob is actually created.

Now, having read in all the area files, the mud resets all areas, that is to say, loads instances of the mobs and objs into the game. We load an instance of mob #3067 into room 3041. Mobile are created in the function create_mobile(), by copying the index data into a char_data struct. Specifically, this happens:
mob->name           = pMobIndex->player_name;
mob->id = get_mob_id();
mob->short_descr = pMobIndex->short_descr;
mob->long_descr = pMobIndex->long_descr;
mob->description = pMobIndex->description;

Note that the pointers themselves are copied, we do not make a new copy of the string. The pointer mob->name points to the same sequence of bytes as pMobIndex->player_name, and becomes the fifth pointer to that particular string. The same thing happens for the short descr, long descr and description. So we've just saved another 115 bytes. We load another instance of #3067, and save ourselves still another 115 bytes by not making duplicates of the strings. Now, we load mob #3068 into room 3040, twice, and save another 230 bytes. We load no less than four copies of mob #3069 into room 3138, saving us a total of 464 bytes for those four mobs. Repeat this over all the mobs and objects in the game and you've saved 805153 bytes; your memory usage for this stuff would have increased by 62%. However this does come at a cost of 98,453 bytes in stock Rom, for the allocated but unused memory in the shared space.

Strictly speaking, this assignment
mob->description    = pMobIndex->description;

and others like it, should be this:
mob->description    = str_dup( pMobIndex->description );

but it doesn't matter in stock Rom because in this case, str_dup will return the same pointer anyway. Remember that none of the index data ever changes, at least not until you reboot. However adding OLC will change this; think of OLC as nothing more than a mechanism for changing the index data from within the game and saving the changes.
13 Oct, 2008, MacGregor wrote in the 33rd comment:
Votes: 0
Runter said:
Also something else I would ask is what is your MAX_STRING set to? The rom system always uses the same amount of ram and shuts you down if you ever go over that MAX_STRING value. So even though you might have x bytes in duplicated space that doesn't tell us how many bytes you are allocating to be used regardless of how much are being saved, ever.

Would you believe I actually read all the area files twice, first time just calculating how much memory will be needed, allocating the exact amount, then rereading everything? No? I wouldn't believe it either. :biggrin:

Serious answer, I'm allocating seven meg, so right now I'm wasting something just under 250K.

quixadhal said:
I'm rather curious of the bucket sizes of these duplicated strings. All strings require a minimum of 4 bytes of RAM (8 bytes if you're on a 64-bit platform!), for the pointer variable. So, could you add a line in there to exclude any strings whose length is less than 9 bytes? Those won't actually benefit from being shared at all, and I'm suspecting there are lots of things like "north", "orc", "sword" and such that are muddying up the results.


Actually that's a pretty fair point, we should count the sizes of the buckets against the savings. I'll have to do some messing around to come up with a hard answer, but grep tells me there are nine instances of the string "sword" in all the area files. Keep in mind that that's already a savings on a 32-bit machine, and a savings on a 64-bit machine if more than one of them is loaded.
14 Oct, 2008, David Haley wrote in the 34th comment:
Votes: 0
I think your numbers are cheating a little for the purposes of evaluating the hashed strings because you are lumping together the savings from the hashed strings system and the savings from sharing pointers. The hashed string system is probably not saving you nearly as much as the pointer sharing.
14 Oct, 2008, Runter wrote in the 35th comment:
Votes: 0
DavidHaley said:
I think your numbers are cheating a little for the purposes of evaluating the hashed strings because you are lumping together the savings from the hashed strings system and the savings from sharing pointers. The hashed string system is probably not saving you nearly as much as the pointer sharing.


Agreed.
14 Oct, 2008, quixadhal wrote in the 36th comment:
Votes: 0
I still think, regardless of the memory savings, having a flaky string subsystem in C (of ALL languages) is just asking for death and destruction. The fact that str_dup() behaves differently at boot time than at run time bothers me. The fact that it relies on the recycle subsystem for non-shared strings, yet does crazy fixed array indexing with hard-coded limits for the shared strings, bothers me. The whole thing smells like a sysadmin somewhere said "Get that MUD down under 2M of RAM, or find a new host!", and somebody spent all night with a bag of Cheetos and a 6-pack of Mountain Dew – and produced str_dup().

I think we can do better. Here's an example that came up on page 1 of a google search. Not saying this is the best, or that we should jump in and use it, but clearly people have developed a few different ones out there. Sadly, not many new ones are being built, since most people are jumping to C++ where it's much easier to do.

This page might prove a useful read as well.

Even if all we did was redo the backend storage part so it wasn't a fixed table, that would be a plus.
14 Oct, 2008, quixadhal wrote in the 37th comment:
Votes: 0
Oh, and I looked into the gc_malloc system. It's a drop-in replacement for malloc/free that uses garbage collection. You can call gc_free to deallocate memory by hand, but you are discouraged from doing so (as that defeats the purpose of using a garbage collector).

IF we wanted to (and are able to, post-OLC) maintain the string pointer copying behavior of the prototype system, using gc_malloc and removing all the free() calls would neatly support that. Building (or modifying) a shared string system atop that would also be quite possible. Actually, in that situation, you'd use str_dup() which would just either return a pointer copy (which would increase the reference count), or it would do a gc_malloc() and strcpy() and return that. I said it that way, as I'm not sure the system call strdup() would use gc_malloc, since glibc will still be linked.

It also looks like they provide hints for using gc_malloc as a leak detector too… which would eliminate the dependancy on libdmalloc as well.

In looking over the license, it appears that the code is compatible. There are a few files used to "build" it that are GPL'd, but the author believes this doesn't require a project using or including it to be GPL'd unless you actually use the build-related files to do something other than build the library itself.

I make the point because it would be wise to include any dependancies (such as the sha256 code I added) directly into our source tree. That eliminates the need to have our users be system administrators and track down such things for themselves. In an ideal universe, I'd have the build process check to see if it needs to make a local copy of the library, but I'll settle for just having it there.

Here's the bit I was referring to…

Quote
Permission is hereby granted to use or copy this program
for any purpose, provided the above notices are retained on all copies.
Permission to modify the code and to distribute modified code is granted,
provided the above notices are retained, and a notice that the code was
modified is included with the above copyright notice.

A few files have other copyright holders. A few of the files needed
to use the GNU-style build procedure come with a modified GPL license
that appears not to significantly restrict use of the collector, though
use of those files for a purpose other than building the collector may
require the resulting code to be covered by the GPL.
14 Oct, 2008, Runter wrote in the 38th comment:
Votes: 0
A few thoughts.

I love gc_malloc as a concept. However, it presents a few problems. (Not that they are a deal-breaker for me.)
A) It isn't standard, and that seems to be a problem for some people.
B) It has considerably more overhead. Not that it concerns me. In the grand scheme it's not that bad. I'm sure it will be bothered by it.

Also, with the string links you listed those are mostly (all) non-standard string libraries.
They also are not as bug-free or feature rich as std::string or even other C++ string
libraries with likely more overhead.

If we're talking about using something not standard here then I fail to see why we aren't just using the C++ library since we're already compiling with g++. Since the argument for keeping it compliant to gcc is so someone can compile with gcc for efficiency reasons–if we have some klunker in C that is used often it kinda defeats the purpose.

Additionally, I think our entire problem with memory sharing could be solved without a massive process. Smart pointers in C++ would be a light-weight solution to this problem, no real mechanic changes needed. [/CPPLOBBY]

Just some thoughts.
14 Oct, 2008, quixadhal wrote in the 39th comment:
Votes: 0
It being a standard is only an issue if we can't include it as part of our distribution. The sha256_crypt() routine isn't part of a standard library either, but it can be incorporated into the source, so it's always available.

I see the same thing for dmalloc/gc_malloc and other things… we can simply embed them, as long as their license allows it. It means we can't use GPL'd stuff (which you can require as a dependancy, but not include directly), but the MIT and BSD licenses are fine.

I think the main arguments for maintaining C compatibility hinge around people not having access to C++ (unlikely since gcc/g++ is a unified codebase, but it IS possible), and a few people who think including any C++ features will automatically make the entire codebase a mess of objects and streams. For the record, I loathe streams (even with the format option to let me keep my printf!).

Overhead is something to consider. I think it's a valid tradeoff for stability and simplicity of coding, but others may disagree. I also think if we produced a codebase that eliminated many of the pointer-caused seg-fault issues, people might feel confident enough to compile without debugging symbols – and that would offset the overhead and then some.

-rwxr-x— 1 quixadhal quixadhal 1275103 Oct 14 07:31 rom
quixadhal@virt2:~/svn/ram-project/src$ strip rom
quixadhal@virt2:~/svn/ram-project/src$ ls -al rom
-rwxr-x— 1 quixadhal quixadhal 583504 Oct 14 07:31 rom


Personally, I'd like nothing better than to (eventually) be able to include methods like save() or reconnect() directly in the player object, or to replace every f'ing char * with a std::string and use cstr when you have to interact with the OS…. but for the moment, nothing we want to do would be substantially easier in C++, so I'm content to wait for everyone to see the light in their own time. :)
14 Oct, 2008, David Haley wrote in the 40th comment:
Votes: 0
quixadhal said:
Sadly, not many new ones are being built, since most people are jumping to C++ where it's much easier to do.

I can't help but stare at this sentence and wonder why its implications aren't clawing their way out from the deep. :wink:

quixadhal said:
I think the main arguments for maintaining C compatibility hinge around people not having access to C++ (unlikely since gcc/g++ is a unified codebase, but it IS possible)

I think that's about as possible as somebody not having access to a text editor. I mean, yes, it IS possible, but…

quixadhal said:
and a few people who think including any C++ features will automatically make the entire codebase a mess of objects and streams

I really don't understand this argument. It's up to you, right, to decide how to use it. You could do some pretty crazy stuff in C, too, does that mean you should go back to assembler?
20.0/62