01 Jun, 2009, Lancsta wrote in the 1st comment:
Votes: 0
Hey I just returned from the hospital *had a 8lb, 1oz baby girl* And found the mud hanging around again. I still have it up in gdb, I'm not gonna kill it this time just because I hit bt and printed a few things. So if there's any input, I can actually gather the info you all need.

This GDB was configured as "i486-slackware-linux"…
Attaching to program: /home/starscream/sunder2.1/bin/sundermud, process 5225
Reading symbols from /usr/lib/libz.so.1…done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libc.so.6…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2…done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2…done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2…done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2…done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /usr/lib/libgcc_s.so.1…done.
Loaded symbols for /usr/lib/libgcc_s.so.1
0xb7e970be in __lll_lock_wait_private () from /lib/libc.so.6
(gdb) bt
#0 0xb7e970be in __lll_lock_wait_private () from /lib/libc.so.6
#1 0xb7e23e6f in _L_lock_15450 () from /lib/libc.so.6
#2 0xb7e23364 in free () from /lib/libc.so.6
#3 0xb7e0e184 in fclose@@GLIBC_2.1 () from /lib/libc.so.6
#4 0x08086308 in sig_handler (sig=6) at comm.c:3492
#5 <signal handler called>
#6 0xb7ddec66 in raise () from /lib/libc.so.6
#7 0xb7de0571 in abort () from /lib/libc.so.6
#8 0xb7e1796b in __libc_message () from /lib/libc.so.6
#9 0xb7e1f8c4 in _int_free () from /lib/libc.so.6
#10 0xb7e23370 in free () from /lib/libc.so.6
#11 0xb7e0e184 in fclose@@GLIBC_2.1 () from /lib/libc.so.6
#12 0x080e96de in save_char_obj (ch=0xb55fb674) at pfile.c:103
#13 0x0810cbca in char_update () at update.c:1460
#14 0x08086356 in sig_handler (sig=11) at comm.c:3504
#15 <signal handler called>
#16 0xb7e261a3 in strlen () from /lib/libc.so.6
#17 0xb7df538a in vfprintf () from /lib/libc.so.6
#18 0xb7dfafe2 in fprintf () from /lib/libc.so.6
#19 0x080e7592 in fwrite_obj (ch=0xb55fb674, obj=0xb5171268, fp=0x8315018,
iNest=1) at pfile.c:668
#20 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171314, fp=0x8315018,
iNest=1) at pfile.c:634
—Type <return> to continue, or q <return> to quit—
#21 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51713c0, fp=0x8315018,
iNest=1) at pfile.c:634
#22 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb517146c, fp=0x8315018,
iNest=1) at pfile.c:634
#23 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171518, fp=0x8315018,
iNest=1) at pfile.c:634
#24 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51715c4, fp=0x8315018,
iNest=1) at pfile.c:634
#25 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51716a0, fp=0x8315018,
iNest=1) at pfile.c:634
#26 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb517174c, fp=0x8315018,
iNest=1) at pfile.c:634
#27 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51717f8, fp=0x8315018,
iNest=1) at pfile.c:634
#28 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51718a4, fp=0x8315018,
iNest=1) at pfile.c:634
#29 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171950, fp=0x8315018,
iNest=1) at pfile.c:634
#30 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51719fc, fp=0x8315018,
iNest=1) at pfile.c:634
#31 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171aa8, fp=0x8315018,
iNest=1) at pfile.c:634
#32 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171b54, fp=0x8315018,
—Type <return> to continue, or q <return> to quit—
iNest=1) at pfile.c:634
#33 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171c00, fp=0x8315018,
iNest=1) at pfile.c:634
#34 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171cac, fp=0x8315018,
iNest=1) at pfile.c:634
#35 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171d58, fp=0x8315018,
iNest=1) at pfile.c:634
#36 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171e04, fp=0x8315018,
iNest=1) at pfile.c:634
#37 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171eb0, fp=0x8315018,
iNest=1) at pfile.c:634
#38 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5171f5c, fp=0x8315018,
iNest=1) at pfile.c:634
#39 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172008, fp=0x8315018,
iNest=1) at pfile.c:634
#40 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51720b4, fp=0x8315018,
iNest=1) at pfile.c:634
#41 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172160, fp=0x8315018,
iNest=1) at pfile.c:634
#42 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51722b8, fp=0x8315018,
iNest=0) at pfile.c:634
#43 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172364, fp=0x8315018,
iNest=0) at pfile.c:634
—Type <return> to continue, or q <return> to quit—
#44 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172410, fp=0x8315018,
iNest=0) at pfile.c:634
#45 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51724bc, fp=0x8315018,
iNest=0) at pfile.c:634
#46 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172568, fp=0x8315018,
iNest=0) at pfile.c:634
#47 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb5172614, fp=0x8315018,
iNest=0) at pfile.c:634
#48 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51726c0, fp=0x8315018,
iNest=0) at pfile.c:634
#49 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb517276c, fp=0x8315018,
iNest=0) at pfile.c:634
#50 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb51596a0, fp=0x8315018,
iNest=0) at pfile.c:634
#51 0x080e74e6 in fwrite_obj (ch=0xb55fb674, obj=0xb517220c, fp=0x8315018,
iNest=0) at pfile.c:634
#52 0x080e9622 in save_char_obj (ch=0xb55fb674) at pfile.c:122
#53 0x0810cbca in char_update () at update.c:1460
#54 0x0810cf17 in update_handler () at update.c:2044
#55 0x0808aa39 in game_loop_unix (mud_desc=4) at comm.c:538
#56 0x0808af79 in main (argc=<value optimized out>, argv=0xbf93c7d4)
at comm.c:295
01 Jun, 2009, Davion wrote in the 2nd comment:
Votes: 0
Get rid of your sig handler. Seems to hang after that's called.

I side tangent… sig handlers are a bad idea! They could lead to corrupt data (aka, a player) being saved to disk. Let'er belly up, sink and dump a meaningful core ;).
01 Jun, 2009, Lancsta wrote in the 3rd comment:
Votes: 0
Sweet I'll start there, I never liked it anyways. I can't remember who put it in, but I think *it looks like* They were looking for the Last_command, to track crashes. But the file leaks random characters and symbols, also seems they commented out half of the sig handler.
01 Jun, 2009, David Haley wrote in the 4th comment:
Votes: 0
Congrats on the baby girl :smile:

This shows us what's going on:

#0  0xb7e970be in __lll_lock_wait_private () from /lib/libc.so.6
#1 0xb7e23e6f in _L_lock_15450 () from /lib/libc.so.6
#2 0xb7e23364 in free () from /lib/libc.so.6
#3 0xb7e0e184 in fclose@@GLIBC_2.1 () from /lib/libc.so.6
#4 0x08086308 in sig_handler (sig=6) at comm.c:3492
#5 <signal handler called>
#6 0xb7ddec66 in raise () from /lib/libc.so.6
#7 0xb7de0571 in abort () from /lib/libc.so.6
#8 0xb7e1796b in __libc_message () from /lib/libc.so.6
#9 0xb7e1f8c4 in _int_free () from /lib/libc.so.6
#10 0xb7e23370 in free () from /lib/libc.so.6
#11 0xb7e0e184 in fclose@@GLIBC_2.1 () from /lib/libc.so.6
#12 0x080e96de in save_char_obj (ch=0xb55fb674) at pfile.c:103
#13 0x0810cbca in char_update () at update.c:1460
#14 0x08086356 in sig_handler (sig=11) at comm.c:3504
#15 <signal handler called>
#16 0xb7e261a3 in strlen () from /lib/libc.so.6
#17 0xb7df538a in vfprintf () from /lib/libc.so.6
#18 0xb7dfafe2 in fprintf () from /lib/libc.so.6
#19 0x080e7592 in fwrite_obj (ch=0xb55fb674, obj=0xb5171268, fp=0x8315018,
iNest=1) at pfile.c:668


There are some things to note here:


- The last thing attempted in normal program execution is saving an object. That's what frame #19 is showing us.
- Something in that is causing a crash due to signal 11 (segmentation fault, i.e., illegal memory reference) and the signal handler kicks in in frame #14.
- The signal handler is presumably trying to save all characters. In frame #12, we see that while working on ch=0xb55fb674 (the same character that caused the original problem), some file is closed, and then free() is called, eventually causing some kind of exception (the signal number is 6, meaning "abort"; see man 3 abort). That's what frames #6-#11 are showing us.
- The signal handler is in the function __lll_lock_wait_private (), and that is presumably why it is hanging: it is waiting to acquire some lock somewhere.

It's worth noting that there is no obvious infinite loop in frames 19-51: at least, if there is one, I couldn't see the object pointer repeating.

It's very likely at this point that something is corrupted with the character's inventory. The interesting part isn't the signal handler: as Davion said, those are kind of sketchy to begin with. The thing to focus on is why a segfault occurred in the first place in frame #19.

So, the first thing to do is to go to that frame, see what code is being executed, and see if the various values make sense. At the very least, you will see which field has an invalid address, causing the segfault.
03 Jun, 2009, Lancsta wrote in the 5th comment:
Votes: 0
Wow I actually got my first core dump today. I was messing around and rewriting my startup script, and for the past 9 years have not had one core dump. So congrats on me actually getting it to work.
Core was generated by `../bin/sundermud 0'.
Program terminated with signal 11, Segmentation fault.
[New process 17776]
#0 is_affected (ch=0x429e11c0, sn=170) at handler.c:1875
1875 if ( paf->type == sn )
(gdb) bt
#0 is_affected (ch=0x429e11c0, sn=170) at handler.c:1875
#1 0x080542a3 in format_inv_to_char (obj=0x429546d8, ch=0x429e11c0,
fShort=0 '\0') at act_info.c:138
#2 0x08054587 in show_list_to_char (list=0x429546d8, ch=0x429e11c0,
fShort=0 '\0', fShowNothing=0 '\0') at act_info.c:250
#3 0x080c4721 in do_movemap (ch=0x429e11c0) at map.c:511
#4 0x0805d2d8 in move_char (ch=0x429e11c0, door=1, follow=0 '\0')
at act_move.c:698
#5 0x0805e47b in do_east (ch=0x3, argument=0xbfda3d1d "") at act_move.c:842
#6 0x080bfda5 in interpret (ch=0x429e11c0, argument=0xbfda3d1d "")
at interp.c:1090
#7 0x0808aa99 in game_loop_unix (mud_desc=4) at comm.c:519
#8 0x0808ad24 in main (argc=<value optimized out>, argv=0xbfdb42e4)
at comm.c:292
03 Jun, 2009, David Haley wrote in the 6th comment:
Votes: 0
Well, that just means that 'paf' is an invalid pointer. It could either be null or point to invalid memory (e.g., it was already freed somewhere, or the pointer is bogus somehow – uninitialized, perhaps).

If the functions are doing something wrong, this is relatively easy to fix. If a bad pointer snuck into a data structure somewhere, life gets a lot harder very quickly. A good to track down things like that is to use a tool like valgrind which can tell you where something was initialized.
03 Jun, 2009, Lancsta wrote in the 7th comment:
Votes: 0
Ok I think perhaps I found it, maybe not.

So we have detect good under is_affected, unlike below
if ( is_affected ( ch, skill_lookup ( "detect good" ) ) && IS_OBJ_STAT $
SLCAT ( buf, "{x{WG{x" );
else
SLCAT ( buf, "{x{W.{x" );

detect magic for can_detect
if ( CAN_DETECT ( ch, DET_MAGIC ) && IS_OBJ_STAT ( obj, ITEM_MAGIC ) )
SLCAT ( buf, "{x{MM{x" );
else
SLCAT ( buf, "{x{M.{x" );

For our spells we have affects, protections, and detections. I'm seeing from the pfile when casting detect_good, it's under the detects, but here we're looking in the affects. Does it make sense? I sometimes can't say with words what I think :p
03 Jun, 2009, David Haley wrote in the 8th comment:
Votes: 0
Uh… sorry, actually I'm having a little trouble following. :wink: It looks like the trouble is coming from:
#0  is_affected (ch=0x429e11c0, sn=170) at handler.c:1875
1875 if ( paf->type == sn )

so we should be trying to figure that one out… does this detection stuff relate to that line at all?
03 Jun, 2009, Lancsta wrote in the 9th comment:
Votes: 0
I was thinking maybe the bits were being set wrong. So the paf, detect_good, would be out of bounds. If it's looking in the detect bits, and it's actually in the affect bits. I think it was frame 8 that looking for an affect.
03 Jun, 2009, David Haley wrote in the 10th comment:
Votes: 0
Hmm. Are we talking about the same core dump? Frame 8 above is just:
#8  0x0808ad24 in main (argc=<value optimized out>, argv=0xbfdb42e4)
at comm.c:292
03 Jun, 2009, Lancsta wrote in the 11th comment:
Votes: 0
No… heh sorry line 8 frame 1. Bleh saw #. I recall an issue we had before where bits were being placed in the wrong slots, and causing problems. I wasn't a part of the fix so I can't remember what it was to fix it before. Was thinking maybe it was another issue like that that just wasn't caught before
0.0/11