From: rphorvic@gauss.cord.edu Here's my two thoughts on the issue - I also go on a gdb intro at the end - a good gdb interactive trace for someone who has never used it, imo. > >ok, 2 quick questions on loop bugs. one, how do you reboot the mud once it > >goes into an endless loop. cant reboot online cuz its like frozen. i dont The ideas of using ps -aux, ps -x, etc are all good. I have found that on my solaris box `ps x` is what I have to use, but on my BSDI box `ps -aux | grep rom` is what I ave to use. On my NeXT cube I go a totally differant route and... well, you get the idea. To standardize it and remove the need to use ps I will put a small snippet of code here that demonstrates how to get the pid of the mud at runtime and have the mud write it to a file - then you just have to look into that file to see what the mud's pid it. The routine should probably be broken out into a function and I would put it fairly early in the startup code - like in the first few lines of main(). If you don't know what main is ... well ... you probably odn't know what a pid is then, either. Begin snippet (this is quick, 2 minute code, compiles without error on my solaris and BSDI boxes - libraries should be pretty standard): --- mudpid.c --- [compiles gcc -o mudpid mudpid.c] #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <string.h> int main(int argc, char **argv) { pid_t pid = -1; FILE *fp = NULL; char buf[10]; if ( (fp = fopen("mud.pid", "w")) == NULL) { printf("Unable to open output file.\n"); return -1; } pid = getpid(); sprintf(buf, "%d", pid); fwrite(buf, sizeof(char), strlen(buf), fp); fclose(fp); return 0; } --- end mudpid.c --- This technique is used fairly commonly with unix servers (for examples, the Progressive Network [Real Audio, etc] servers do this) and provides a quick way to deal with the problem across multiple platforms. Obviously it would be quite easy to write a simple perl script that will read the mud.pid file, extract the pid and kill the mud for you - thus also automating any cleanup routines you want run in concurance with killing the mud. if one were so inclined - here is a rough perl shell for that (works): --- killmud.pl --- kills the mud based on the pid in "mud.pid" #!/usr/local/bin/perl open (MUDFILE, "mud.pid"); $pid = <MUDFILE>; print("Killing process $pid\n"); system("kill -9 $pid"); --- end killmud.pl --- If you don't know about perl get the Camel book by Larry Wall, et al, and go nuts reading and watch your life get easier. > >know how to reboot from the shell...can i? second, how should i go about > >finding the loop bug? To do this I would recomend using a symbolic debugger like gdb (*nix) or any of the variety of ones available for the Win32 platform. To use gdb do this: [This is a trace of an interactive session I had in gdb - the areas where I entered commands are preceded by the prompt "(gdb) ", anything else is output from my previous command.] bash$ gdb GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.13 (sparc-sun-solaris2.4), Copyright 1994 Free Software Foundation, Inc. (gdb) file ../src/rom Reading symbols from ../src/rom...done. (gdb) break main Breakpoint 1 at 0x37314: file comm.c, line 366. (gdb) run Starting program: /users/rphorvic/Rom24/area/../src/rom Breakpoint 1, main (argc=1, argv=0xeffffdc4) at comm.c:366 366 gettimeofday( &now_time, NULL ); (gdb) step 367 current_time = (time_t) now_time.tv_sec; (gdb) step 368 strcpy( str_boot_time, ctime( ¤t_time ) ); (gdb) continue Continuing. Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 10525:1 -> 10535:3 -> 10534. Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 3458:2 -> 3472:0 -> 10401. Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 8705:4 -> 8706:5 -> 8708. Mon Mar 23 09:21:17 1998 :: [*****] BUG: Fix_exits: 8717:2 -> 8719:0 -> 8718. Err: obj an elemental rod of earthquake (9217) -- 7, mob a small rock (9217) -- 3 Err: obj elemental wand of wind and air (9218) -- 27, mob an alchemist (9234) -- 13 Err: obj an ice staff (9216) -- 25, mob a puddle (9214) -- 8 Err: obj an icicle (9227) -- 28, mob the Ice Bandit (9228) -- 24 Err: obj elemental wand of fire (9215) -- 16, mob a flame (9215) -- 4 Err: obj elemental wand of fire (9215) -- 16, mob a flame (9215) -- 4 Err: obj elemental wand of wind and air (9218) -- 27, mob a small spark (9218) - - 4 Err: obj elemental wand of wind and air (9218) -- 27, mob an eddie (9225) -- 2 Err: obj an ice staff (9216) -- 25, mob a baby rainbow dragon (9235) -- 16 Err: obj a wet noodle (8010) -- 5, mob a Futsie (8002) -- 17 Mon Mar 23 09:21:17 1998 :: ROM is ready to rock on port 4000. ^C Program received signal SIGINT, Interrupt. 0xef6baba0 in poll () (gdb) step Single stepping until exit from function poll, which has no line number information. 0xef6d210c in _select () (gdb) quit The program is running. Quit anyway (and kill it)? (y or n) y ... Do you see how I was able to load the program, set a break point (important to do this) and then run the program? Once I typed run the entry of main() triggered the breakpoint I had set. Then I was able to step through the program line by line as it ran. (`step`) and eventually use `continue` to have execution just go on as normal. At this point I could have telneted in and used my mud as I normally do. If you type `run` and everything looks good then try to telnet in and do what you normally do to produce the loop. When it looks up on your client side switch back to gdb (which has been running the whole time) and hit CTRL+C (you see what I did?) and it will dump out your current location withint he executable. I recomend then exiting gdb, and starting it again, but this time setting the breakpoint for the function where you died last time, not main. For me, if I cared, that function was `poll` - since I was just blocking for users. Chances are you will recognize having recently edited that file that contains that function or you will have recently applied a patch that edited that file. start stepping throught eh function you have broken at until you get into the loop. Now you know where your problem is. For a webpage that is a decent gdb tutorial and a great command reference for those of use who knkow debugging theory but just want a good reference, check here: http://tlaloc.sfsu.edu/~hodges/cs410/gdbtut.frm.html Also a search for "+gdb +tutorial" (without the quotes) at www.altavista.com will reveal many gdb tutorials - including the one I listed above. I hope this has been helpful. Robert Horvick [kanin] not affiliated with any mud ... hell, I hardly play them.