23 May, 2009, Lancsta wrote in the 1st comment:
Votes: 0
Running Sunder2.1, eventually the mud will stall and hang. I haven't been able to recreate the issue yet so I don't know what's leading to it. I grabbed gdb, but not sure if I grabbed enough info. Still learning to use it. Any help is a greatly appreciated, you guys have been awesome so far. If there's anything else I can be doing with gdb please let me know also. Thanks again.
(gdb) bt
#0 0x080bbf93 in affect_remove_obj (obj=0x6168206b, paf=0x6f6c2073)
at handler.c:1844
#1 0x0810cf1c in update_handler () at update.c:2045
#2 0x0808aa39 in game_loop_unix (mud_desc=4) at comm.c:538
#3 0x0808af79 in main (argc=<value optimized out>, argv=0xbfd09234)
at comm.c:295
(gdb) list
1839
1840 paf->next = affect_free;
1841 affect_free = paf;
1842 // affect_free = paf->next;
1843 return;
1844 }
1845
1846 /*
1847 * Strip all affects of a given sn.
1848 */
(gdb) print paf
$1 = (AFFECT_DATA *) 0x6f6c2073
(gdb) print *paf
Cannot access memory at address 0x6f6c2073
(gdb) print *affect_free
$2 = {next = 0xb51ae7d8, type = 390, level = 99, duration = 0, location = 18,
modifier = 93, where = 0, bitvector = 512, caster = 0x0}
(gdb) print affect_free
$3 = (AFFECT_DATA *) 0xb51aec38
(gdb) list
1849
1850 void affect_strip ( CHAR_DATA * ch, int sn )
1851 {
1852 AFFECT_DATA *paf;
1853 AFFECT_DATA *paf_next;
1854
1855 for ( paf = ch->affected; paf != NULL; paf = paf_next )
1856 {
1857 paf_next = paf->next;
1858 if ( paf->type == sn )
(gdb) bt
#0 0x080bbf93 in affect_remove_obj (obj=0x6168206b, paf=0x6f6c2073)
at handler.c:1844
#1 0x0810cf1c in update_handler () at update.c:2045
#2 0x0808aa39 in game_loop_unix (mud_desc=4) at comm.c:538
#3 0x0808af79 in main (argc=<value optimized out>, argv=0xbfd09234)
at comm.c:295
(gdb) print *obj
Cannot access memory at address 0x6168206b
(gdb) print obj
$4 = (OBJ_DATA *) 0x6168206b
(gdb) print paf
$5 = (AFFECT_DATA *) 0x6f6c2073
(gdb) frame 0
#0 0x080bbf93 in affect_remove_obj (obj=0x6168206b, paf=0x6f6c2073)
at handler.c:1844
1844 }
(gdb) print obj
$6 = (OBJ_DATA *) 0x6168206b
(gdb) print *obj
Cannot access memory at address 0x6168206b
(gdb) list
1839
1840 paf->next = affect_free;
1841 affect_free = paf;
1842 // affect_free = paf->next;
1843 return;
1844 }
1845
1846 /*
1847 * Strip all affects of a given sn.
1848 */
23 May, 2009, Zeno wrote in the 2nd comment:
Votes: 0
Sounds like an infinite loop. What happens when you use next a few times in gdb?
23 May, 2009, David Haley wrote in the 3rd comment:
Votes: 0
What happens before you do the backtrace?
23 May, 2009, Lancsta wrote in the 4th comment:
Votes: 0
1847 * Strip all affects of a given sn.
Is a comment for the next function, so that's the end of the affect_remove_obj function.

I don't recall what happens before the backtrace if your referring within gdb.

Looks like I didn't copy the beginning of it.
Sorry I cut my finger pretty bad today working on remodeling the awaiting babies room and am on some pretty good pain killers, had to re read your request to make sense. :(
23 May, 2009, David Haley wrote in the 5th comment:
Votes: 0
Well, you can reproduce the problem, right? It's important to know how you got to that point: did you manually interrupt the process? Did something else happen?

Anyhow, to answer Zeno's question, you'll have to go back into gdb and see what's actually happening. It's not very useful to tell us that it's at the end of the function when we can already see that. :smile: Anyhow, the interesting part is not where it is, but what it's doing.
23 May, 2009, Lancsta wrote in the 6th comment:
Votes: 0
Right sorry. No I keep catching it and yeah, I attached gdb to the already running program. Let me browse real quick, I don't think this is the first time I've had it, I might have it posted on my forums or even on another file.



Sweet, well apparently I haven't a recording of this one so, I'll wait for it to happen again. And no I haven't been able to recreate the instance.
23 May, 2009, Sharmair wrote in the 7th comment:
Votes: 0
It is pretty clear from your gdb output that either your binary and source files are out of
sync or something is corrupting the stack. Frame #0 seems to be fully trashed and is not
of much use (does update_handler() really even call affect_remove_obj()?), so I would
discount anything from there (at least for now). I would look at frame #1 and see what it
really looks like (I would like to see the whole update_handler()). What is at line 2045 of
update.c? Does the code or any loaded data file have a string that contains "k has lo"?
(if the problem is a stack corruption, it is most likely by a string containing those characters).
Is there a buf in update_handler(), and if so, is it overflowing?
23 May, 2009, David Haley wrote in the 8th comment:
Votes: 0
For a supposedly corrupted stack, the stack trace appears to be in pretty good shape. Also, a stack corruption would cause a crash, not a hang that needs to be interrupted by attaching gdb to the process. For these reasons I am somewhat skeptical of the stack corruption theory (besides were it corrupted, why would only frame 0 be useless?), although we don't really have a lot of information to make many claims anyhow.
Nonetheless given that it's a hang, it's probably an infinite loop of some kind, and given that we're looking at effects, it could be something like a loop in an effect linked list.

Where does "k has lo" come from, Sharmair?
23 May, 2009, Sharmair wrote in the 9th comment:
Votes: 0
David Haley said:
For a supposedly corrupted stack, the stack trace appears to be in pretty good shape. Also, a stack corruption would cause a crash, not a hang that needs to be interrupted by attaching gdb to the process. For these reasons I am somewhat skeptical of the stack corruption theory (besides were it corrupted, why would only frame 0 be useless?), although we don't really have a lot of information to make many claims anyhow.
Nonetheless given that it's a hang, it's probably an infinite loop of some kind, and given that we're looking at effects, it could be something like a loop in an effect linked list.

Maybe invalid would have been a better word. From what I can see in his session, he is not in the function frame zero
is showing at all, and the shown arguments are not pointers at all. Like you say, there is little info here, and I would
like to see the answers to the questions I asked (other then the last one as on further thought I think it is highly unlikely
to be the issue). Maybe the first possibility is the cause of the wacky back trace (the binary and source out of sync causing
gdb so show invalid source). If it is a corruption (a function overwriting data to the stack out of it's scope's valid bounds),
it would probably be by whatever function is really being called by update_handler() and is minor enough to not extend
that far. As for an example of why I think the info from the frame is useless,if he is at the end of the function, and the
line executed was:
affect_free = paf;

Then why is affect_free (a global variable) different from paf (a variable passes on the stack)?
David Haley said:
Where does "k has lo" come from, Sharmair?

From:
#0  0x080bbf93 in affect_remove_obj (obj=0x6168206b, paf=0x6f6c2073)

Arguments are passed to a function from right to left, the stack builds down and this is probably a little endian CPU.
That would make the argument byte sequence '6b 20 68 61 73 20 6c 6f' and that is the ascii "k has lo".
23 May, 2009, Lancsta wrote in the 10th comment:
Votes: 0
Line 2045 of update.c is
obj_update ( );
of update_handler function.

void obj_update ( void )
{
OBJ_DATA *obj;
OBJ_DATA *obj_next;
AFFECT_DATA *paf, *paf_next;

for ( obj = object_list; obj != NULL; obj = obj_next )
{
CHAR_DATA *rch;
const char *message;

obj_next = obj->next;

/* go through affects and decrement */
for ( paf = obj->affected; paf != NULL; paf = paf_next )
{
paf_next = paf->next;
if ( paf->duration > 0 )
{
paf->duration–;
if ( number_range ( 0, 4 ) == 0 && paf->level > 0 )
paf->level–; /* spell strength fades with time */
}
else if ( paf->duration < 0 )
;
else
{
if ( paf_next == NULL
|| paf_next->type != paf->type
|| paf_next->duration > 0 )
{
if ( paf->type > 0 &&
skill_table[paf->type].msg_off )
{
act_new ( skill_table[paf->type].msg_off,
obj->carried_by, obj, NULL,
POS_SLEEPING, TO_CHAR );
}
}
affect_remove_obj ( obj, paf );
}
}
if ( obj->timer <= 0 || –obj->timer > 0 )
continue;
if ( obj->in_room || (obj->carried_by && obj->carried_by->in_room))
{
if ( HAS_TRIGGER_OBJ( obj, TRIG_DELAY )
&& obj->oprog_delay > 0 )
{
if ( –obj->oprog_delay <= 0 )
p_percent_trigger( NULL, obj, NULL, NULL, NULL, NULL, TRIG_DELAY );
}
else if ( ((obj->in_room && !obj->in_room->area->empty)
|| obj->carried_by ) && HAS_TRIGGER_OBJ( obj, TRIG_RANDOM ) )
p_percent_trigger( NULL, obj, NULL, NULL, NULL, NULL, TRIG_RANDOM );
}
/* Make sure the object is still there before proceeding */
if ( !obj )
continue;

switch ( obj->item_type )
{
default:
message = "$p crumbles into dust.";
break;
case ITEM_FOUNTAIN:
message = "$p dries up.";
break;
case ITEM_CORPSE_NPC:
message = "$p decays into dust.";
break;
case ITEM_CORPSE_PC:
message = "$p decays into dust.";
break;
case ITEM_FOOD:
message = "$p decomposes.";
break;
case ITEM_POTION:
message = "$p has evaporated from disuse.";
break;
}
if ( obj->carried_by != NULL )
{
if ( IS_NPC ( obj->carried_by )
&& obj->carried_by->pIndexData->pShop != NULL )
obj->carried_by->gold += obj->cost / 5;
else
{
act ( message, obj->carried_by, obj, NULL,
TO_CHAR );
sound ("DECAY.WAV", obj->carried_by );
}
}
else if ( obj->in_room != NULL &&
( rch = obj->in_room->people ) != NULL )
{
if ( !
( obj->in_obj &&
obj->in_obj->pIndexData->vnum == OBJ_VNUM_PIT &&
!CAN_WEAR ( obj->in_obj, ITEM_TAKE ) ) )
{
act ( message, rch, obj, NULL, TO_ROOM );
act ( message, rch, obj, NULL, TO_CHAR );
sound ("DECAY.WAV", rch);
}
}
if ( obj->item_type == ITEM_CORPSE_PC && obj->contains )
{ /* save the contents */
OBJ_DATA *t_obj, *next_obj;

for ( t_obj = obj->contains; t_obj != NULL;
t_obj = next_obj )
{
next_obj = t_obj->next_content;
obj_from_obj ( t_obj );

if ( obj->in_obj ) /* in another object */
obj_to_obj ( t_obj, obj->in_obj );

else if ( obj->carried_by ) /* carried */
if (obj->wear_loc == WEAR_FLOAT)
{
if (obj->carried_by->in_room == NULL)
extract_obj(t_obj);
else
obj_to_room(t_obj,obj->carried_by->in_room);
}
else
obj_to_char ( t_obj, obj->carried_by );

else if ( obj->in_room == NULL ) /* destroy it */
extract_obj ( t_obj );

else /* to a room */
obj_to_room ( t_obj, obj->in_room );
}
}
extract_obj ( obj );
}
return;
}
23 May, 2009, David Haley wrote in the 11th comment:
Votes: 0
Is your binary the most recent version from your source files? That is, have you modified your source files since compiling your binary? If that isn't true, any attempts at debugging will be futile.

What is around the call to obj_update on line 2045?

I am still skeptical of the stack corruption theory, given that it's far more likely to cause a crash than an infinite loop.

Sharmair said:
As for an example of why I think the info from the frame is useless,if he is at the end of the function, and the
line executed was:
affect_free = paf;

Then why is affect_free (a global variable) different from paf (a variable passes on the stack)?

If the function was in the middle of returning when the program was interrupted, it can display the last brace as the current line of the program, depending on various factors. That's why I want to see the code step.

Sharmair said:
David Haley said:
Where does "k has lo" come from, Sharmair?

From:
#0  0x080bbf93 in affect_remove_obj (obj=0x6168206b, paf=0x6f6c2073)

Arguments are passed to a function from right to left, the stack builds down and this is probably a little endian CPU.
That would make the argument byte sequence '6b 20 68 61 73 20 6c 6f' and that is the ascii "k has lo".

Isn't that a little far-fetched? :thinking:

Frankly I think the best course of action here is to do what Zeno suggested in the very first post, and just step through the code. We have extremely little information to go by, and all that we can do now is make more or less wild guesses.
23 May, 2009, Lancsta wrote in the 12th comment:
Votes: 0
Right, yeah I'm sorry I didn't post it as I had it up, I'm unable to recreate, I'm trying to do anything I can to do so. I have not changed, nor recompiled anything at all.

Also, what other steps can I be taking to make sure I gather enough information from gdb?
0.0/12