12 Feb, 2009, Hades_Kane wrote in the 1st comment:
Votes: 0
I'm running a heavily modified ROM codebase.

A little while ago, we started having the game freeze once in a while. I can't seem to reproduce it and can't narrow down what's causing it. It happened earlier with only 1 player on idling.

I know when the game freezes, you're supposed to be able to attach gdb to the process and get an idea of why, but this is what mine keeps showing:

Attaching to program: xx/xx/xx/area/eot, process 22365
0xb7ef6678 in ?? ()
(gdb) bt
#0 0xb7ef6678 in ?? ()
#1 0x00000000 in ?? ()


Does that look familiar to anyone? Are there other ways to catch it frozen to get more answers?
12 Feb, 2009, Skol wrote in the 2nd comment:
Votes: 0
Does it ever 'un'freeze? I've had freezes due to DNS lookup when a socket connects sometimes. (There's a way to make that a multi-threaded doohicky, but I've never learned how).

Those are just speedbumps though, not a full on freeze.
12 Feb, 2009, elanthis wrote in the 3rd comment:
Votes: 0
Probably a memory corruption issue, judging from the stacktrace. Welcome to the wonderful world of C.

Try running under Valgrind and see if it can catch where you're abusing memory at.
12 Feb, 2009, Hades_Kane wrote in the 4th comment:
Votes: 0
Skol said:
Does it ever 'un'freeze? I've had freezes due to DNS lookup when a socket connects sometimes. (There's a way to make that a multi-threaded doohicky, but I've never learned how).

Those are just speedbumps though, not a full on freeze.


I'm not entirely sure. Today it was frozen for about 5 minutes or so, and I believe with the same gdb backtrace, the freeze has lasted longer than that.


elanthis said:
Probably a memory corruption issue, judging from the stacktrace. Welcome to the wonderful world of C.

Try running under Valgrind and see if it can catch where you're abusing memory at.


Do I need to have valgrind running prior to it freezing, or is there some way to attach valgrind to it while its frozen to see what it is unhappy about?
12 Feb, 2009, David Haley wrote in the 5th comment:
Votes: 0
Are you sure that you have debugging information and all that enabled?

One thing you could do would be to simply always run it in gdb, so that you can ctrl-c (break) when it freezes and look at it that way.

It kind of looks like your stack is bogus, though. Do you have multiple threads or something?
12 Feb, 2009, Hades_Kane wrote in the 6th comment:
Votes: 0
I've been able to attach gdb to a freeze before and get the information on where it froze, and I don't think that there would have been anything major in that regard changed since the last time there was a freeze I was able to catch with gdb… so I think all that is enabled, and I don't think I have multiple threads.
12 Feb, 2009, David Haley wrote in the 7th comment:
Votes: 0
Are you doing anything with DNS lookups? If those happen in the same process, they can occasionally cause considerable hangs if you're doing it synchronously (i.e., with only one thread/process). But then, you'd have noticed that it happened when people connected, so this might not be it.
12 Feb, 2009, Kline wrote in the 8th comment:
Votes: 0
If you go the Valgrind route you can't attach Valgrind to a running proc; it needs to load it.* I've found many bugs (and small memory errors) by running in Valgrind for a day or two then doing a clean shutdown to see what wasn't free'd. This will require you to write a routine to clean up all used memory on shutdown, but makes tracking down bugs magnitudes easier.


* You can attach it to a running proc, if you want to attempt to modify Valgrind or pull some dirty tricks on it, see this thread for more ideas: http://osdir.com/ml/debugging.valgrind/2...
13 Feb, 2009, Hades_Kane wrote in the 9th comment:
Votes: 0
DavidHaley said:
Are you doing anything with DNS lookups? If those happen in the same process, they can occasionally cause considerable hangs if you're doing it synchronously (i.e., with only one thread/process). But then, you'd have noticed that it happened when people connected, so this might not be it.


I think when it happened today, it might have happened right after I connected, but I'm unsure because all it ever did was "Connected to host eotmud.com" and I never saw past that. As far as if I'm doing anything with DNS lookups, I'm not really sure. When someone connects to the MUD, it presents the initial connection info… "Incoming connection from _______" but, that only happens when the greet is sent, which is after the prompt for color or not. What type of things might I be on the lookout for in regards to DNS lookups?


Kline said:
If you go the Valgrind route you can't attach Valgrind to a running proc; it needs to load it.* I've found many bugs (and small memory errors) by running in Valgrind for a day or two then doing a clean shutdown to see what wasn't free'd. This will require you to write a routine to clean up all used memory on shutdown, but makes tracking down bugs magnitudes easier.


I appreciate the tips. It's probably not a bad idea for general maintenance to do this every once in a while. Is there likely to be anything in the code already that I might be able to reference in writing a routine to clean up the used memory on shutdown?
13 Feb, 2009, Kline wrote in the 10th comment:
Votes: 0
If you're still using free_lists, they're a good place to start with. In that case you'd end up with a lot of blocks of code similar to:
H_QUEUE *h, *h_next;

for( h = h_free; h != NULL; h = h_next )
{
h_next = h->next;
free(h);
}


And don't forget the large constant things, or IMC, if you use it:
free(string_space);
free(social_table);
#ifdef IMC
free_imcdata(true);
#endif


Using C++ STL lists makes it even easier:
for_each( area_list.begin(),       area_list.end(),       DeleteObject() );


Just drop everything in a nice cleanup_mem() or similar func and place it in your game_loop_unix() below the close(control) call.

I'd run Valgrind, run a set of select actions (who, users, social, magic) then just shutdown and see what it reports. Work errors from the top down as best you can, as fixing one can uncover another (or fix another; this is right from their FAQ).
0.0/10