Runter
Wizard


Group: Members
Posts: 1,851
Joined: Jun 1, 2006
|
#31 id:13832 Posted Oct 13, 2008, 4:17 pm
|
Also something else I would ask is what is your MAX_STRING set to? The rom system always uses the same amount of ram and shuts you down if you ever go over that MAX_STRING value. So even though you might have x bytes in duplicated space that doesn't tell us how many bytes you are allocating to be used regardless of how much are being saved, ever.
|
......................... CoralMud project
For once you have tasted flight Ruby you will walk the earth with your eyes turned skywards,
for there you have been and there you will long to return. --
Leonardo Da Vinci Yukihiro Matsumoto
|
|
MacGregor
Magician

Group: Members
Posts: 54
Joined: Oct 3, 2008
|
#32 id:13837 Posted Oct 13, 2008, 7:39 pm
|
Okay, hopefully this will illustrate how Rom's string sharing works, and where the memory savings come from. Let's take a look in midgaard.are and a couple mobs therein. Note that I've removed the flags, position and other stuff not germane to this. I'm keeping only the fields which are strings, name.y, the name, short descr, long descr, and description. I've also included the race since that's handled as a string.
Code (text): 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 | #3067
cityguard guard~
the cityguard~
A cityguard is here, guarding the gate.
~
A big, strong, helpful, trustworthy guard.
~
human~
#3068
cityguard guard~
the cityguard~
A cityguard is here, guarding the gate.
~
A big, strong, helpful, trustworthy guard.
~
human~
#3069
cityguard guard~
the cityguard~
A cityguard is here, guarding the mayor.
~
A big, strong, helpful, trustworthy guard.
~
human~ |
Note that all three mobs have the same name, short and description, 3067 and 3068 have the same long descr as well.
When the mud is loading the area files at boot time, the strings are read by fread_string() in db.c. It will read the strings for 3067 and, since they haven't appeared before, will allocate space in the shared mstring space and store them there. When it reads the name for 3068, it finds that it already has the string "cityguard guard" in string space, so instead of storing another copy of it it simply returns a pointer to the copy of the string which is already there, and we've just saved 16 bytes. We also now have two pointers to the same string "cityguard guard". The same thing will happen when it reads the short descr, long descr and descriptions for #3068. When it gets to #3069, fread_string will return pointers to the previously stored name, short descr and description, store the new long descr and return a pointer to that. So we now have twelve different pointers pointing to a total of five different strings, our strings take up a total of 157 bytes and we've saved ourselves another 189 bytes. These pointers, by the way, are in the index data structs, which are used as templates when a mob is actually created.
Now, having read in all the area files, the mud resets all areas, that is to say, loads instances of the mobs and objs into the game. We load an instance of mob #3067 into room 3041. Mobile are created in the function create_mobile(), by copying the index data into a char_data struct. Specifically, this happens:
Code (text): 1
2
3
4
5
6
7 | mob->name = pMobIndex->player_name;
mob->id = get_mob_id();
mob->short_descr = pMobIndex->short_descr;
mob->long_descr = pMobIndex->long_descr;
mob->description = pMobIndex->description; |
Note that the pointers themselves are copied, we do not make a new copy of the string. The pointer mob->name points to the same sequence of bytes as pMobIndex->player_name, and becomes the fifth pointer to that particular string. The same thing happens for the short descr, long descr and description. So we've just saved another 115 bytes. We load another instance of #3067, and save ourselves still another 115 bytes by not making duplicates of the strings. Now, we load mob #3068 into room 3040, twice, and save another 230 bytes. We load no less than four copies of mob #3069 into room 3138, saving us a total of 464 bytes for those four mobs. Repeat this over all the mobs and objects in the game and you've saved 805153 bytes; your memory usage for this stuff would have increased by 62%. However this does come at a cost of 98,453 bytes in stock Rom, for the allocated but unused memory in the shared space.
Strictly speaking, this assignment
Code (text): 1
2
3 | mob->description = pMobIndex->description; |
and others like it, should be this:
Code (text): 1
2
3 | mob->description = str_dup( pMobIndex->description ); |
but it doesn't matter in stock Rom because in this case, str_dup will return the same pointer anyway. Remember that none of the index data ever changes, at least not until you reboot. However adding OLC will change this; think of OLC as nothing more than a mechanism for changing the index data from within the game and saving the changes.
|
|
|
MacGregor
Magician

Group: Members
Posts: 54
Joined: Oct 3, 2008
|
#33 id:13838 Posted Oct 13, 2008, 7:58 pm
|
Runter said:Also something else I would ask is what is your MAX_STRING set to? The rom system always uses the same amount of ram and shuts you down if you ever go over that MAX_STRING value. So even though you might have x bytes in duplicated space that doesn't tell us how many bytes you are allocating to be used regardless of how much are being saved, ever.
Would you believe I actually read all the area files twice, first time just calculating how much memory will be needed, allocating the exact amount, then rereading everything? No? I wouldn't believe it either.
Serious answer, I'm allocating seven meg, so right now I'm wasting something just under 250K.
quixadhal said:I'm rather curious of the bucket sizes of these duplicated strings. All strings require a minimum of 4 bytes of RAM (8 bytes if you're on a 64-bit platform!), for the pointer variable. So, could you add a line in there to exclude any strings whose length is less than 9 bytes? Those won't actually benefit from being shared at all, and I'm suspecting there are lots of things like "north", "orc", "sword" and such that are muddying up the results.
Actually that's a pretty fair point, we should count the sizes of the buckets against the savings. I'll have to do some messing around to come up with a hard answer, but grep tells me there are nine instances of the string "sword" in all the area files. Keep in mind that that's already a savings on a 32-bit machine, and a savings on a 64-bit machine if more than one of them is loaded.
|
|
|
|
|
Runter
Wizard


Group: Members
Posts: 1,851
Joined: Jun 1, 2006
|
#35 id:13840 Posted Oct 13, 2008, 9:01 pm
|
DavidHaley said:I think your numbers are cheating a little for the purposes of evaluating the hashed strings because you are lumping together the savings from the hashed strings system and the savings from sharing pointers. The hashed string system is probably not saving you nearly as much as the pointer sharing.
Agreed.
|
......................... CoralMud project
For once you have tasted flight Ruby you will walk the earth with your eyes turned skywards,
for there you have been and there you will long to return. --
Leonardo Da Vinci Yukihiro Matsumoto
|
|
quixadhal
Wizard


Group: Members
Posts: 1,473
Joined: Oct 17, 2007
|
#36 id:13849 Posted Oct 14, 2008, 4:54 am
|
I still think, regardless of the memory savings, having a flaky string subsystem in C (of ALL languages) is just asking for death and destruction. The fact that str_dup() behaves differently at boot time than at run time bothers me. The fact that it relies on the recycle subsystem for non-shared strings, yet does crazy fixed array indexing with hard-coded limits for the shared strings, bothers me. The whole thing smells like a sysadmin somewhere said "Get that MUD down under 2M of RAM, or find a new host!", and somebody spent all night with a bag of Cheetos and a 6-pack of Mountain Dew -- and produced str_dup().
I think we can do better. Here's an example that came up on page 1 of a google search. Not saying this is the best, or that we should jump in and use it, but clearly people have developed a few different ones out there. Sadly, not many new ones are being built, since most people are jumping to C++ where it's much easier to do.
This page might prove a useful read as well.
Even if all we did was redo the backend storage part so it wasn't a fixed table, that would be a plus.
|
......................... 
|
|
quixadhal
Wizard


Group: Members
Posts: 1,473
Joined: Oct 17, 2007
|
#37 id:13850 Posted Oct 14, 2008, 6:10 am
|
Oh, and I looked into the gc_malloc system. It's a drop-in replacement for malloc/free that uses garbage collection. You can call gc_free to deallocate memory by hand, but you are discouraged from doing so (as that defeats the purpose of using a garbage collector).
IF we wanted to (and are able to, post-OLC) maintain the string pointer copying behavior of the prototype system, using gc_malloc and removing all the free() calls would neatly support that. Building (or modifying) a shared string system atop that would also be quite possible. Actually, in that situation, you'd use str_dup() which would just either return a pointer copy (which would increase the reference count), or it would do a gc_malloc() and strcpy() and return that. I said it that way, as I'm not sure the system call strdup() would use gc_malloc, since glibc will still be linked.
It also looks like they provide hints for using gc_malloc as a leak detector too... which would eliminate the dependancy on libdmalloc as well.
In looking over the license, it appears that the code is compatible. There are a few files used to "build" it that are GPL'd, but the author believes this doesn't require a project using or including it to be GPL'd unless you actually use the build-related files to do something other than build the library itself.
I make the point because it would be wise to include any dependancies (such as the sha256 code I added) directly into our source tree. That eliminates the need to have our users be system administrators and track down such things for themselves. In an ideal universe, I'd have the build process check to see if it needs to make a local copy of the library, but I'll settle for just having it there.
Here's the bit I was referring to...
Quote:Permission is hereby granted to use or copy this program
for any purpose, provided the above notices are retained on all copies.
Permission to modify the code and to distribute modified code is granted,
provided the above notices are retained, and a notice that the code was
modified is included with the above copyright notice.
A few files have other copyright holders. A few of the files needed
to use the GNU-style build procedure come with a modified GPL license
that appears not to significantly restrict use of the collector, though
use of those files for a purpose other than building the collector may
require the resulting code to be covered by the GPL.
|
......................... 
Last edited Oct 14, 2008, 6:12 am by quixadhal
|
|
Runter
Wizard


Group: Members
Posts: 1,851
Joined: Jun 1, 2006
|
#38 id:13852 Posted Oct 14, 2008, 6:42 am
|
A few thoughts.
I love gc_malloc as a concept. However, it presents a few problems. (Not that they are a deal-breaker for me.)
A) It isn't standard, and that seems to be a problem for some people.
B) It has considerably more overhead. Not that it concerns me. In the grand scheme it's not that bad. I'm sure it will be bothered by it.
Also, with the string links you listed those are mostly (all) non-standard string libraries.
They also are not as bug-free or feature rich as std::string or even other C++ string
libraries with likely more overhead.
If we're talking about using something not standard here then I fail to see why we aren't just using the C++ library since we're already compiling with g++. Since the argument for keeping it compliant to gcc is so someone can compile with gcc for efficiency reasons--if we have some klunker in C that is used often it kinda defeats the purpose.
Additionally, I think our entire problem with memory sharing could be solved without a massive process. Smart pointers in C++ would be a light-weight solution to this problem, no real mechanic changes needed. [/CPPLOBBY]
Just some thoughts.
|
......................... CoralMud project
For once you have tasted flight Ruby you will walk the earth with your eyes turned skywards,
for there you have been and there you will long to return. --
Leonardo Da Vinci Yukihiro Matsumoto
Last edited Oct 14, 2008, 6:44 am by Runter
|
|
quixadhal
Wizard


Group: Members
Posts: 1,473
Joined: Oct 17, 2007
|
#39 id:13856 Posted Oct 14, 2008, 7:23 am
|
It being a standard is only an issue if we can't include it as part of our distribution. The sha256_crypt() routine isn't part of a standard library either, but it can be incorporated into the source, so it's always available.
I see the same thing for dmalloc/gc_malloc and other things... we can simply embed them, as long as their license allows it. It means we can't use GPL'd stuff (which you can require as a dependancy, but not include directly), but the MIT and BSD licenses are fine.
I think the main arguments for maintaining C compatibility hinge around people not having access to C++ (unlikely since gcc/g++ is a unified codebase, but it IS possible), and a few people who think including any C++ features will automatically make the entire codebase a mess of objects and streams. For the record, I loathe streams (even with the format option to let me keep my printf!).
Overhead is something to consider. I think it's a valid tradeoff for stability and simplicity of coding, but others may disagree. I also think if we produced a codebase that eliminated many of the pointer-caused seg-fault issues, people might feel confident enough to compile without debugging symbols -- and that would offset the overhead and then some.
Code (text): 1
2
3
4
5
6 | -rwxr-x--- 1 quixadhal quixadhal 1275103 Oct 14 07:31 rom
quixadhal@virt2:~/svn/ram-project/src$ strip rom
quixadhal@virt2:~/svn/ram-project/src$ ls -al rom
-rwxr-x--- 1 quixadhal quixadhal 583504 Oct 14 07:31 rom |
Personally, I'd like nothing better than to (eventually) be able to include methods like save() or reconnect() directly in the player object, or to replace every f'ing char * with a std::string and use cstr when you have to interact with the OS.... but for the moment, nothing we want to do would be substantially easier in C++, so I'm content to wait for everyone to see the light in their own time. :)
|
......................... 
|
|
David Haley
Wizard


Group: Members
Posts: 6,913
Joined: Jun 30, 2007
|
#40 id:13858 Posted Oct 14, 2008, 8:35 am
|
quixadhal said:Sadly, not many new ones are being built, since most people are jumping to C++ where it's much easier to do.
I can't help but stare at this sentence and wonder why its implications aren't clawing their way out from the deep.
quixadhal said:I think the main arguments for maintaining C compatibility hinge around people not having access to C++ (unlikely since gcc/g++ is a unified codebase, but it IS possible)
I think that's about as possible as somebody not having access to a text editor. I mean, yes, it IS possible, but...
quixadhal said:and a few people who think including any C++ features will automatically make the entire codebase a mess of objects and streams
I really don't understand this argument. It's up to you, right, to decide how to use it. You could do some pretty crazy stuff in C, too, does that mean you should go back to assembler?
|
|
|
quixadhal
Wizard


Group: Members
Posts: 1,473
Joined: Oct 17, 2007
|
#41 id:13860 Posted Oct 14, 2008, 12:33 pm
|
Code (text): 1
2
3
4
5
6
7
8 | LDX #$0F
STX $D020
STX $D021
DEX
CPX #$00
BNE $F7 |
Hehehehe... anyone recognize the language? Bonus points if you know what it does. Yes, I did have to lookup the BNE syntax, and yes, you do have to count.
This might be good reading.
|
......................... 
|
|
|
|
quixadhal
Wizard


Group: Members
Posts: 1,473
Joined: Oct 17, 2007
|
#43 id:13907 Posted Oct 15, 2008, 7:00 am
|
quixadhal said:Code (text): 1
2
3
4
5
6
7
8 | LDX #$0F
STX $D020
STX $D021
DEX
CPX #$00
BNE $F7 |
Ok, time's up. It's 6502 assembly language, and specifically written to have special meaning if you run it on a Commodore 64. That code loops through all 15 screen colors and sets the screen border ($D020) and background ($D021) to those colors... very quickly!
I've advocated teaching assembly as the first language people learn in school because, even though it's hard to do anything practical with it, it shows you exactly how the computer really works. I had zero problems understanding pointers in my first college class (I used basic and assembly before that), because I knew how pointers really worked. Most of the rest of the people in my Pascal class looked at them as magic words and didn't like the voodoo of making one magic word invoke another magic word.
The Motorola 6502/6510 CPU was an 8-bit CPU with 16-bit memory addressing. Hence the 64K limit to main memory. It had an accumulator register and two 8-bit indexing registers, called X and Y.
The only clever bits are the last two lines (not really clever, but unusual if you're not used to assembly). CPX #$00 compares the contents of the X register to the constant 0.
BNE $F7 says Brance if Not Equal to memory location offset $F7. That's actually a bug, I miscounted! $F7 is a signed number, so instead of being 247, it's the 1's compliment number, -9. It really should have been -11, I forgot to count the current instruction.
To do a loop in assembly, you do a GOTO from the current location of the program counter (In this case, the byte after the data value for the BNE instruction). So, counting the fact that every instruction is 1 byte long, and the two memory addresses are 2 bytes, to put us back at the first STX instruction, we'd walk back 11 bytes.
The code above says Code (text): 1
2
3
4
5
6
7
8
9
10 |
char x = 15;
do
{
*(53280) = x;
*(53281) = x;
x--;
} while ( x > 0 ); |
So, in my quickie code example, because of the bug, it would have flickered the background color, but only set the border color to 15 and never touched it again. At the end of the loop, the border will be light grey, and the background will be black. :)
|
......................... 
|
|
Runter
Wizard


Group: Members
Posts: 1,851
Joined: Jun 1, 2006
|
#44 id:13910 Posted Oct 15, 2008, 8:25 am
|
Personally, I think we should make them write their code on punch cards.
In all seriousness, there are ways to adapt assembly into your C programs just fine for teaching purposes without having to go all out teaching students a dead language. Also, considering most modern languages have done away with pointers, and in C++ most large-scale projects no longer use pointers in favor of references, I think in the next 25 or so years knowledge about pointers will be about as useful as knowledge about assembly for the general public.
|
......................... CoralMud project
For once you have tasted flight Ruby you will walk the earth with your eyes turned skywards,
for there you have been and there you will long to return. --
Leonardo Da Vinci Yukihiro Matsumoto
Last edited Oct 15, 2008, 8:34 am by Runter
|
|
Runter
Wizard


Group: Members
Posts: 1,851
Joined: Jun 1, 2006
|
#45 id:13912 Posted Oct 15, 2008, 8:42 am
|
DavidSomething said:quixadhal said:I think the main arguments for maintaining C compatibility hinge around people not having access to C++ (unlikely since gcc/g++ is a unified codebase, but it IS possible)
I think that's about as possible as somebody not having access to a text editor. I mean, yes, it IS possible, but...
You see a man trying to compile a C++ mud but only has access to gcc. Remain calm.
There is a fifth dimension beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination.
You've just entered the twilight zone.
|
......................... CoralMud project
For once you have tasted flight Ruby you will walk the earth with your eyes turned skywards,
for there you have been and there you will long to return. --
Leonardo Da Vinci Yukihiro Matsumoto
|
|