From: rphorvic@gauss.cord.edu

Here's my two thoughts on the issue - I also go on a gdb intro at the end 
- a good gdb interactive trace for someone who has never used it, imo.

> >ok, 2 quick questions on loop bugs. one, how do you reboot the mud once it
> >goes into an endless loop.  cant reboot online cuz its like frozen.  i dont

The ideas of using ps -aux, ps -x, etc are all good.  I have found that 
on my solaris box `ps x` is what I have to use, but on my BSDI box `ps 
-aux | grep rom` is what I ave to use.  On my NeXT cube I go a totally 
differant route and... well, you get the idea.  To standardize it and 
remove the need to use ps I will put a small snippet of code here that 
demonstrates how to get the pid of the mud at runtime and have the mud 
write it to a file - then you just have to look into that file to see 
what the mud's pid it.  The routine should probably be broken out into a 
function and I would put it fairly early in the startup code - like in 
the first few lines of main().

If you don't know what main is ... well ... you probably odn't know what 
a pid is then, either.

Begin snippet (this is quick, 2 minute code, compiles without error on my 
solaris and BSDI boxes - libraries should be pretty standard):

--- mudpid.c --- [compiles gcc -o mudpid mudpid.c]

#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>


int main(int argc, char **argv)
{
	pid_t	pid = -1;
	FILE	*fp = NULL;
	char	buf[10];

	if ( (fp = fopen("mud.pid", "w")) == NULL)
	{
		printf("Unable to open output file.\n");
		return -1;
	}
	
	pid = getpid();
	sprintf(buf, "%d", pid);
	fwrite(buf, sizeof(char), strlen(buf), fp);
	fclose(fp);
	return 0;
}	
	
--- end mudpid.c ---

This technique is used fairly commonly with unix servers (for examples, 
the Progressive Network [Real Audio, etc] servers do this) and provides a 
quick way to deal with the problem across multiple platforms.  Obviously 
it would be quite easy to write a simple perl script that will read the 
mud.pid file, extract the pid and kill the mud for you - thus also 
automating any cleanup routines you want run in concurance with killing 
the mud.

if one were so inclined - here is a rough perl shell for that (works):

--- killmud.pl --- kills the mud based on the pid in "mud.pid"

#!/usr/local/bin/perl

open (MUDFILE, "mud.pid");
$pid = <MUDFILE>;

print("Killing process $pid\n");
system("kill -9 $pid");  

--- end killmud.pl ---

If you don't know about perl get the Camel book by Larry Wall, et al, and 
go nuts reading and watch your life get easier.

> >know how to reboot from the shell...can i?  second, how should i go about
> >finding the loop bug?

To do this I would recomend using a symbolic debugger like gdb (*nix) or 
any of the variety of ones available for the Win32 platform.  To use gdb 
do this:

[This is a trace of an interactive session I had in gdb - the areas where 
I entered commands are preceded by the prompt "(gdb) ", anything else is 
output from my previous command.]

bash$ gdb
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.13 (sparc-sun-solaris2.4), Copyright 1994 Free Software Foundation, 
Inc.
(gdb) file ../src/rom
Reading symbols from ../src/rom...done.
(gdb) break main
Breakpoint 1 at 0x37314: file comm.c, line 366.
(gdb) run
Starting program: /users/rphorvic/Rom24/area/../src/rom

Breakpoint 1, main (argc=1, argv=0xeffffdc4) at comm.c:366
366         gettimeofday( &now_time, NULL ); 
(gdb) step
367         current_time        = (time_t) now_time.tv_sec;
(gdb) step
368         strcpy( str_boot_time, ctime( &current_time ) );
(gdb) continue 
Continuing.
Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 10525:1 -> 10535:3 -> 
10534.
Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 3458:2 -> 3472:0 -> 
10401.
Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 8705:4 -> 8706:5 -> 8708.
Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 8717:2 -> 8719:0 -> 8718.
Err: obj an elemental rod of earthquake (9217) -- 7, mob a small rock 
(9217) --
3
Err: obj elemental wand of wind and air (9218) -- 27, mob an alchemist 
(9234) --
 13
Err: obj an ice staff (9216) -- 25, mob a puddle (9214) -- 8
Err: obj an icicle (9227) -- 28, mob the Ice Bandit (9228) -- 24
Err: obj elemental wand of fire (9215) -- 16, mob a flame (9215) -- 4
Err: obj elemental wand of fire (9215) -- 16, mob a flame (9215) -- 4
Err: obj elemental wand of wind and air (9218) -- 27, mob a small spark 
(9218) -
- 4
Err: obj elemental wand of wind and air (9218) -- 27, mob an eddie (9225) 
-- 2
Err: obj an ice staff (9216) -- 25, mob a baby rainbow dragon (9235) -- 16
Err: obj a wet noodle (8010) -- 5, mob a Futsie (8002) -- 17
Mon Mar 23 09:21:17 1998 :: ROM is ready to rock on port 4000. 
^C
Program received signal SIGINT, Interrupt.
0xef6baba0 in poll () 
(gdb) step
Single stepping until exit from function poll,
which has no line number information.
0xef6d210c in _select ()
(gdb) quit
The program is running.  Quit anyway (and kill it)? (y or n) y


...

Do you see how I was able to load the program, set a break point 
(important to do this) and then run the program?  Once I typed run the 
entry of main() triggered the breakpoint I had set.  Then I was able to 
step through the program line by line as it ran.  (`step`) and eventually 
use `continue` to have execution just go on as normal.  At this point I 
could have telneted in and used my mud as I normally do.  If you type 
`run` and everything looks good then try to telnet in and do what you 
normally do to produce the loop.  When it looks up on your client side 
switch back to gdb (which has been running the whole time) and hit CTRL+C 
(you see what I did?) and it will dump out your current location withint 
he executable.

I recomend then exiting gdb, and starting it again, but this time setting 
the breakpoint for the function where you died last time, not main.  For 
me, if I cared, that function was `poll` - since I was just blocking for 
users.  Chances are you will recognize having recently edited that file 
that contains that function or you will have recently applied a patch 
that edited that file.

start stepping throught eh function you have broken at until you get into 
the loop.  Now you know where your problem is.

For a webpage that is a decent gdb tutorial and a great command reference 
for those of use who knkow debugging theory but just want a good 
reference, check here:

http://tlaloc.sfsu.edu/~hodges/cs410/gdbtut.frm.html

Also a search for "+gdb +tutorial" (without the quotes) at 
www.altavista.com will reveal many gdb tutorials - including the one I 
listed above.

I hope this has been helpful.

Robert Horvick
[kanin]

not affiliated with any mud ... hell, I hardly play them.