24 Oct, 2007, Vladaar wrote in the 1st comment:
Votes: 0
Not sure if these are already here some place, but I know they used to be on old muddomain, and are helpful, so posting here.

Using GDB to Diagnose a Crash

Written by Roger Libiez ( Samson ) March 28, 2004
Copyright (C) 2004

Ok. So you're fiddling around one day with the latest and greatest nifty new
feature for your MUD. You've labored for hours adding the code, playing with
it to get things just right. You've compiled it, and GCC didn't raise any
complaints. You're home free…. except… wait? What the hell does it mean
"Segmentation fault (core dumped)" and why won't the MUD boot!

Chances are at some point in your coding career you'll be greeted with this
dreadful scenario. All of us have been there at one time or another. All of
us know what it feels like to scratch your head wondering what happened.

My background in coding is primarily with Smaug muds, and specifically with
the AFKMud project. I've had my fair share of things go wrong over the years
and I'm no stranger to core dumps. I also find it's best to cover these things
with real examples, so I'll share one I just caused in my own code today.
In order for GDB to provide you with meaningful information, you need to make
sure your MUD has been compiled to provide debug information. This is generally
done with the -g parameter. I tend to stick with -g2 or better. This will
usually be found on one of the flag lines in your Makefile.

We're in the process of moving AFKMud to use C++ code, and some of you may
be aware of pitfalls involved. I just got done making descritpors into a class
and am still shaking things down. Lo and behold, I reboot, run a command, and
am greeted with:

[samson@boralis: ~/Alsherok/src] Segmentation fault (core dumped)

Uh oh, looks like I fubared something. The first thing you need to do when a
core dump happens is determine where your core file is. With Smaug, the core
file will usually end up in your area directory. So you'll need to go there.
Change into your area directory, and you should type something like
gdb -c core ../src/smaug

In my case, AFKMud moves the core to the same directory as the source code,
so I would do this:

[samson@boralis: ~/Alsherok/src] gdb -c core afkmud

Upon doing so, I am greeted by a whole bunch of output:

GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"…
Core was generated by `../src/afkmud 9500'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.1…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libz.so.1…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libdl.so.2…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libstdc++.so.5…done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/tls/libm.so.6…done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1…done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6…done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2…done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2…done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_nisplus.so.2…done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnsl.so.1…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_dns.so.2…done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2…done.
Loaded symbols for /lib/libresolv.so.2
#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
158 if( !mccp->out_compress )

Wow. Ok. So it's basically told me it loaded all of the symbols for everything
the MUD uses. What does all that mean? Generally not a great deal. Everything
above where it says #0 is system libraries you won't need to worry about. It's
the stuff after that you need to pay attention to.

So now you have a general idea of what caused the problem. Something in the
compressEnd() function did something it wasn't supposed to do. This however
is generally not enough information to go on. You probably want to know what
led up to this probem. So with that in mind, you'll want to trace the history
of what caused this. Fortunately GDB makes that easy with the bt, or backtrace

(gdb) bt
#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
#1 0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136
#2 0x0814ece7 in rent_adjust_pfile(char*) (argument=0x8a573df "******") at rent.c:1542
#3 0x0814fcc9 in rent_update() () at rent.c:2108
#4 0x0814530f in do_pfiles (ch=0x8a0b570, argument=0xbfffbf8c "tar -cf ../player/pfiles.tar ../player/*") at
#5 0x08113e61 in interpret(char_data*, char*) (ch=0x8a0b570, argument=0xbfffd9f6 "") at interp.c:907
#6 0x080dc9e9 in game_loop() () at comm.c:785
#7 0x080dd651 in main (argc=2, argv=0x8a0bcf8) at comm.c:1233
#8 0x40154758 in __libc_start_main () from /lib/tls/libc.so.6
Current language: auto; currently c++

The backtrace will be listed in reverse, starting with the first function the
MUD called, and ending with the last one it was in when it crashed. In this
case, it began in main() and ended in compressEnd(). So why did it do this?
You find that out by entering the stack "frames", or functions, and asking
it what certain things were at the time. So in this case, we'll check frame 1,
which is in descriptor.c on line 136:

(gdb) frame 1
#1 0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136
136 compressEnd( );

You see here the call to compressEnd(), ok, that's not enough info yet.
Lets look at the call that killed it, in frame 0:

(gdb) frame 0
#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158
158 if( !mccp->out_compress )

Aha, this is a hint - something in features.c on line 158 is amiss.
Start checking this line methodically. Begin by asking it what "mccp"
equal to at the time:

(gdb) print mccp
$1 = (mccp_data *) 0x0

This tells you that the "mccp" portion of the call was NULL, 0x0 stands for
NULL, basically the absence of any data. Nothing, zero, zilch, etc. In this
particular case, telling us that the structure which holds the data for
this person's mccp_data is empty. It hasn't been initialized. Attempting
to access NULL data in any way will result in a crash, which is what happened.

Now that you know what happened, lets exit GDB.

(gdb) quit

You should return to a shell prompt. It's time to go fix your bug and try again.

Hopefully this article has proven useful. There are more advanced things you
can do with GDB, but this should cover the basics of investigating a crash
after the fact.

In reference, this is the code which crashed:

close( descriptor );
DISPOSE( host );
delete [] outbuf;
DISPOSE( pagebuf );
STRFREE( client );

compressEnd( );
DISPOSE( mccp );

And this is what fixes it:

close( descriptor );
DISPOSE( host );
delete [] outbuf;
DISPOSE( pagebuf );
STRFREE( client );

if( mccp != NULL )
compressEnd( );
DISPOSE( mccp );

Noting in the second version that we verify mccp isn't NULL before ending compression.
24 Oct, 2007, Vladaar wrote in the 2nd comment:
Votes: 0
Cleanup Memory by Remcon

Oh, for those who don't know having your mud cleanup_memory on exit helps out when you toss your mud into valgrind to check for leaks and all. Most of this came from afkmud and was just modified (where it needed to be) to fit into smaugfuss. The actual code snippet post can be found http://www.smaugmuds.org/index.php?a=top...

Also below article

Basics of Debugging with Valgrind
By Fredrick
Intended audience: Mud programmers with at least some experience in coding C and using the standard available tools (gdb, gcc).

First off, if your MUD is multithreaded/multiprocess, you can stop reading now. Multithreaded debugging is a whole different league than single threaded ditto. I won't go into the reasons for this, since even explaining why it's different requires some effort. If you don't even know what I'm talking about, chances are that you have a single threaded MUD. All code bases I know of are single threaded… there has been numerous discussions about the pros and cons of multithreaded MUDs on the Mudconnector's (http://www.mudconnect.com) discussion forums over the years, and they have nearly always come up with "don't; it's not worth it".

This small article is meant to help people who have problems with their MUD that they just can't find. These programs are memory leaks and unexplicable random crashes, totally unexpected behavor and other things.

I am going at it with the approach that "everything is about memory", a theory that I have seen proven over and over in my (so far short) programming career. Therefore, I will begin with explaining a few things about programs and memory and then go in with showing how to find those problems with a tool called Valgrind. If you can't get Valgrind or don't have the resources to use Valgrind, you lose; this all assumes the use of Valgrind.

What is a MUD?
The question isn't as esoteric as it may seem. A MUD is a program that
executes in the computers memory and uses some of its resources in the

The program is loaded at startup and stays there for as long as it executes. During this time, the program may ask for memory to work in in order to perform certain tasks, such as building tables of, for example, players and rooms and such. Now, there are two "types" of memory (all is RAM, of course, it's just used in two different ways), stack and heap.

The stack is, to define it sloppily, what the program code executes in/on. If you have a function, it will occupy one or more "stack frames" when it executes. The memory that is allocated on the stack is freed, deallocated, as soon as the function is done executing. This happens automatically; there is nothing you have to do in order to free that memory.

For example, in the following function

void foo(int bar)
int yikes;
char buffer[100];
yikes = bar;
sprintf(buffer, "%d", yikes);

all the memory used to hold the variables bar, yikes and buffer are allocated on the stack and automatically freed as soon as the function returns.

To take another example:

void foo(int bar)
int yikes = 0;
char buffer[100];

if (bar != 10)
int rakka = 100;
yikes = rakka*bar;
int rokko = 200;
yikes = rokko*bar;
sprintf(buffer, "%d", yikes);

You notice how the variables rakka and rokko are declared inside {}'s of their own? As soon as it leaves those {}, both rakka and rokko are free'd, so at the "sprintf" call, they no longer exist. In simplified terms, a stack frame is defined by curly braces, {}, and stack memory is allocated and deallocated respectively when it hits those during execution of your code.

Heap, on the other hand, is the memory that you yourself explicitly allocate using new or malloc, f.ex.

char *buffer = malloc(100);

That memory does _not_ get deallocated when you leave the function, it remains marked as used until the MUD finishes (either by exiting nicely or crashing) or you call free on it explicitly. It may not seem like a biggie at first, but if you call malloc(100) every time someone enters a command in your mud, the memory usage is bound to rise quickly until either the host bogs down from disk swapping or your MUD hoster decides to kill your MUD. Or worse, you get thrown out for eating up all the resources.

What is a memory leak, then? A leak is heap allocated (malloc, new) memory that never is freed and you lose the reference to. Below is an obvious and therefore atypical leak.

void foo(int bar)
char *buffer = malloc(100);
sprintf(buffer, "%d", bar);

As soon as foo is done executing, 100 bytes have leaked, since you then lose all reference to it and have no way of knowing exactly where in memory those 100 bytes are so that you can deallocate them.

A more common leak is this:

void foo(int bar)
char buffer[100];
sprintf(buffer, makeNameFromNumber(bar));

char *makeNameFromNumber(int bar)
char *name = malloc(100);
sprintf(name, "MikkaHakkinen%d", bar);
return name;

Here, the leak comes from the author of foo not realizing that he must free the memory he gets when calling makeNameFromNumber. A better way would be to do the following in foo instead:

void foo(int bar)
char buffer[100], *tmp;
tmp = makeNameFromNumber(bar);
sprintf(buffer, tmp);

Now, it's sort of tedious to have to do "free" every damn time you call makeNameFromNumber, isn't it? A common approach to get away from this is to pass in a char * as parameter to the function, like this:

void foo(int bar)
char buffer[100];
makeNameFromNumber(buffer, bar);

void makeNameFromNumber(char *buf, int bar)
sprintf(buf, "MikkaHakkinen%d", bar);

That way, you don't have to malloc memory at all in makeNameFromNumber.

Elusive leaks
There's also something which is far harder to track that I prefer to call elusive leaks. That's memory which still is reachable but the programmer forgot all about. Let's say he's a control nazi and creates a table of all commands ever typed, but being less than perfect he forgets to clean it up every now and then, making it grow …and grow… and grow. That is something that is indeed very hard to hunt down, since it's not even a programming error, it's a logic error.

Corruption means that the program has written into memory areas it wasn't supposed to. Those are the hardest bugs to find, if you don't have the tools for it. A very common error can be exemplified here; let's assume that fixName is meant to check the name that a new user has just entered when creating a new char:

void fixName(struct char_data *ch, char *name)
if (strcspn(name, " '\\\".,;.-_#&/()=%@$") != strlen(name))
send_to_char("Name contains illegal characters, try again:", ch);
if (!ch-name)
ch->name = strdup(name);
sprintf(ch->name, name);
send_to_charf(ch, "What class do you want to be, %s?", ch->name);

Now, this may seem innocent enough. And indeed, if called only once, it is perfectly ok. But if it's called again, this time with a longer name, you're in a world of trouble. Why? Let's show with a little more detail. "creation" is the equivalent of Circle's "nanny", that is, it deals with all the input a player has when not playing.
24 Oct, 2007, Guest wrote in the 3rd comment:
Votes: 0
I've been meaning to get the gdb article posted to the articles section. I still have the file around somewhere.