31 Oct, 2008, quixadhal wrote in the 21st comment:
Votes: 0
Skol said:
Sounds like it's referring to the victim after the damage has been done, so the victim is dead and then not in a room (extracted) so the room is NULL (0x0 in the gdb backtrace). What I would do is put a check in the bool 'can_see' that if the victim's room , or the player's room doesn't exist, return FALSE.

ie.
if (victim->in_room == NULL || ch->in_room == NULL)
return FALSE;


Actually, I was just thinking about this, this morning. Years ago, I would have agreed with you, and in fact my old MUD has lots of extra sanity checks that try to return "the right answer" when something isn't there, thus avoiding a crash. In those days, I shared the mindset that a crash is something to be avoided at all costs.

Now, I'm far more inclined to only put such checks into things where they get used in a wide variety of circumstances. Unless you're running a mission critical system (one where lives, or decades of work, are at stake), I prefer having the game crash when the impossible happens. Read that again. :)

The DikuMUD system was designed to always have characters (player or NPC) be contained in a room. That's why rooms like "The Void" exist, because if you can't figure out where to put someone, you have to put them somewhere. Given that design, it's not appropriate to check if ch->in_room is NULL, because it should never happen. You should extract a character and immediately put them into some other room. If you do anything else in that undefined state, you should either do all the checking yourself, or rethink if you really need to do it there, OR do all the work of redesigning the system.

Having said all that, I'm of the opinion that extract_char() is being called too early, and that whatever other processing is happening should happen before that call, or should be delayed until they're restored to whatever destination that game's death system sends them to.

can_see() sounds like too much of a utility function, and I don't think *IT* should be the thing that checks for a valid room, since it shouldn't be called on dead people. Either you can_see() a living pc/npc, or you can_see() a corpse object… not a dead person. :)
31 Oct, 2008, Igabod wrote in the 22nd comment:
Votes: 0
why would extract_char be the problem? the mud isn't crashing upon the death of anything, it seems to be occuring at the moment the bonus for swords kicks in not at level like i originally thought. the line if (can_see(victim, ch)) is followed by the chance of this bonus kicking in. i am having severe difficulty wrapping my head around this problem.
31 Oct, 2008, quixadhal wrote in the 23rd comment:
Votes: 0
Not extract_char itself… but as I said, a character in a DikuMUD is not ever meant to exist outside of a room. By rights, as little as possible should happen between a call to extract_char() and move_to() (or whatever this mud uses to move a character to a room). The fact that it's crashing in combat, means some part of combat is happening AFTER the (presumably dead) character has been removed from the room in which the fight was happening.

Imagine you have coded an apple, and you've made a function called sitting_on() that returns the surface the apple is sitting on. Now, let's say you have a way to pick up the apple and put it down on something else. You've coded that in two steps, pick_up() and put_down( surface ).

Now, the apple is sitting on a desk. You want to move the apple from the desk to a table. You call pick_up(apple). Now, before you've called put_down(apple, table), something decides to call sitting_on(). What happens?

The apple isn't sitting on anything, and you haven't designed the apple to be able to float in mid-air, so the code crashes trying to reference a non-existent surface.

Likewise, your character has been removed from his room, and THEN some part of combat code tries to do and/or check something that expects him to still be in a room. It's not a "bug" as in a bad line of code… it's code that shouldn't be running in the place it IS running. It should have either been called before extract_char(), or after the character has been placed in their new room.
31 Oct, 2008, Igabod wrote in the 24th comment:
Votes: 0
ok i think i know what you're saying here. you're saying that the whole part of that snippet i installed was put in the wrong place in that function. just so i can get some more experienced eyes on this i'm gonna post the whole function here.

void violence_update (void)
{
CHAR_DATA *ch;
CHAR_DATA *ch_next;
CHAR_DATA *victim;

for (ch = char_list; ch != NULL; ch = ch_next)
{
ch_next = ch->next;

if ((victim = ch->fighting) == NULL || ch->in_room == NULL)
continue;

if (IS_AWAKE (ch) && ch->in_room == victim->in_room)
{
multi_hit (ch, victim, TYPE_UNDEFINED);

/*
* Swords will very rarely get in an extra round.
*/
if (get_weapon_sn (ch) == gsn_sword
&& ch->wait <= 0)
{
int chance;
CHAR_DATA * was_fighting;
CHAR_DATA * vch;

chance = get_skill (ch, gsn_sword) / 15;

/*
* Penalty if the opponent can see.
*/
if (can_see (victim, ch))
chance = chance * 3 / 5;

for (vch = ch->in_room->people; vch != NULL;
vch = vch->next_in_room)
{
if (vch->fighting == ch
|| is_same_group (vch->fighting, ch))
{
if (number_percent () < chance)
{
was_fighting = ch->fighting;
ch->fighting = vch;

multi_hit (ch, vch, TYPE_UNDEFINED);
check_improve (ch, gsn_sword, TRUE, 2);

ch->fighting = was_fighting;
WAIT_STATE (ch, PULSE_VIOLENCE);
break;
}
}
}
}
}

else
stop_fighting (ch, FALSE);

if ((victim = ch->fighting) == NULL)
continue;

/*
* Fun for the whole family!
*/
check_assist (ch, victim);

if (IS_NPC (ch))
{
if (HAS_TRIGGER (ch, TRIG_FIGHT))
mp_percent_trigger (ch, victim, NULL, NULL, TRIG_FIGHT);
if (HAS_TRIGGER (ch, TRIG_HPCNT))
mp_hprct_trigger (ch, victim);
}
}

return;
}

the snippet i installed is between lines 19 and 58. i didn't really think i was putting it in the right place when i did it but wasn't sure where the right place was so i went with that.
31 Oct, 2008, Sandi wrote in the 25th comment:
Votes: 0
That should go in multi_hit() and be calling one_hit(). You want it to be rare don't you? :)


You should use vch_next in the for() loop.
31 Oct, 2008, David Haley wrote in the 26th comment:
Votes: 0
quixadhal said:
Given that design, it's not appropriate to check if ch->in_room is NULL, because it should never happen.

Well, I would make a strong statement here: it is appropriate to check that ch->in_room is NULL, but in the form of an assertion. It should not be silently ignored because it is a problem – it means that somebody has made a programmatic mistake and that should be caught.

In fact, were it practical to do so, I would have such invariants be tested at nearly every line of the code. If an actor's room is being set to NULL, and that actor is still "alive" (not being extracted, etc….) then something is very wrong.

Igabod said:
i am not really keeping any sort of organized changes list but thats cause this is never going to be played by anybody but me. i'm just using this for learning experience so i can make a good mud later on. i should get in the habit of being a little more organized now though.

Indeed, learning good organization and methodology will only help you later on. It's good to learn what happens when you aren't organized when it doesn't matter – much better than to learn from your mistakes when you have a large playerbase and your MUD is crashing left and right.

Discipline in making changes is probably one of the hardest things to learn and ingrain into yourself; don't get me wrong, it took me many years (where many > 5). After a while (a long while) your intuition for which changes are trivial and which actually matter will get pretty good. But even then, using some kind of version control is a tremendous help in debugging. It all sounds annoying and tedious, but eventually it becomes second nature and will save you much more than it costs you.
01 Nov, 2008, Igabod wrote in the 27th comment:
Votes: 0
Sandi said:
That should go in multi_hit() and be calling one_hit(). You want it to be rare don't you? :)


You should use vch_next in the for() loop.

it's rare enough for the mud i'm working with, the battles in stock keggermud tend to take forever even between two evenly matched opponents. having these little things shortens the battle while still adding plenty flavor.

DavidHaley said:
quixadhal said:
Given that design, it's not appropriate to check if ch->in_room is NULL, because it should never happen.

Well, I would make a strong statement here: it is appropriate to check that ch->in_room is NULL, but in the form of an assertion. It should not be silently ignored because it is a problem – it means that somebody has made a programmatic mistake and that should be caught.

In fact, were it practical to do so, I would have such invariants be tested at nearly every line of the code. If an actor's room is being set to NULL, and that actor is still "alive" (not being extracted, etc….) then something is very wrong.

Igabod said:
i am not really keeping any sort of organized changes list but thats cause this is never going to be played by anybody but me. i'm just using this for learning experience so i can make a good mud later on. i should get in the habit of being a little more organized now though.

Indeed, learning good organization and methodology will only help you later on. It's good to learn what happens when you aren't organized when it doesn't matter – much better than to learn from your mistakes when you have a large playerbase and your MUD is crashing left and right.

Discipline in making changes is probably one of the hardest things to learn and ingrain into yourself; don't get me wrong, it took me many years (where many > 5). After a while (a long while) your intuition for which changes are trivial and which actually matter will get pretty good. But even then, using some kind of version control is a tremendous help in debugging. It all sounds annoying and tedious, but eventually it becomes second nature and will save you much more than it costs you.

while this is good advice which i appreciate, how does any of this relate to the problem at hand? i've been bashing my head into my desk for almost a week because of this problem and could really use some insight into the solution. even if you know the answer and don't want to tell me what it is, i'd appreciate some advice on how to FIND the answer. this is a truly frustrating bug.
01 Nov, 2008, David Haley wrote in the 28th comment:
Votes: 0
Unfortunately, I have no idea how to solve your problem without spending many hours in front of the code – it's kind of like looking for needles in a haystack, because you don't know what has changed. You need to isolate very closely when it occurs, and then find the corresponding lines of code and figure out what caused the room to become null. Make liberal use of assertions, which might help find the problem more quickly. And keep track of everything you do, in case you accidentally introduce another "feature". :smile:
01 Nov, 2008, Igabod wrote in the 29th comment:
Votes: 0
but i do know what changed, i can remember when i made every change and what that change was, i have been updating help changes every time i make a change, i just haven't been very organized in the order i do things and all that. i know that the last three changes i made before i discovered the crash bug were to fiddle with exp levels (i put everything back to normal after i found the bug) and before that was the weapons snippet, and before that was to make thieves able to use rescue. before i tested the thieves rescue i was using a warrior wielding a sword to test a new spell for mages (vicegrip, another snippet from here) and had absolutely no crashes. so i am sure it's in that bit i posted yesterday within violence_update. with this information, where would you recommend i look?
01 Nov, 2008, David Haley wrote in the 30th comment:
Votes: 0
Well, there's a difference between knowing the conceptual change, and knowing what lines of code actually changed from what to what. It's the lines of code that are helpful. But if you know which general areas you touched, that is a very good start. You should try to apply – on a copy of the code – the snippets in reverse to revert to the old version. Then compare them, and try to work out exactly what happened. You know that some ch's room is being set to NULL. What could possibly have done that in the code you added? You will need to investigate the various functions that are being called, following the call traces. Good candidates are things like extract_char, which are meant to be called when the ch is about to disappear (this isn't supposed to happen for players when they die, incidentally).

One problem is that you call multi_hit from ch to victim, but then you check if ch can still see victim. What if victim died in the meantime? That's where I would start looking.

Note that this might not actually be your fault but might be a poorly written snippet, or a snippet that was applied to a codebase it wasn't quite meant for.
01 Nov, 2008, Sandi wrote in the 31st comment:
Votes: 0
CHAR_DATA *vch, *vch_next;
for (vch = ch->in_room->people; vch != NULL; vch = vch_next)
{
vch_next = vch->next;
etc…..

You'll find this in several places in the ROM code. What is does is save the "next" for us in case 'vch' turns out to be null.
BTW, you will find extract_char in raw_kill. :)
01 Nov, 2008, David Haley wrote in the 32nd comment:
Votes: 0
What Sandi points out (and pointed out above already, actually) is also very important and another potential problem here. So there's some work cut out for you here. :wink:
08 Nov, 2008, Igabod wrote in the 33rd comment:
Votes: 0
i still can't find the solution to this problem and i'm sick of looking for it so i'm just gonna slap a band-aid on it and hope i don't see this happen again. since there are no players on this it really doesn't matter too awful much. if i were going to have it online i'd have another coder that would help me but since i'm doing this on my own, i just dont care. maybe when i get bored one day i'll get back to looking for the solution but for now i'm just gonna be lazy about it. it's been several weeks with no progress, thanks for all the help though, you've all helped my understanding of the codebase immensely.
08 Nov, 2008, Runter wrote in the 34th comment:
Votes: 0
I don't know how useful this advice is but you may look into using break points and watch points in gdb to actually examine what is going on up until the point it crashes.

You may also look into using valgrind which can give you some insight on the line and context where something was freed after it is accessed improperly and causes a crash. (This tool is a real heaven-send if you know how to use it.)

Frankly, this is just a worst case scenario (which is unfortunately high in this case) for having to do debugging on poorly coded infrastructure. Which isn't your fault, but maybe you could learn something from debugging and hopefully some insight on how the code was designed poorly in the first place.

Quote
CHAR_DATA *vch, *vch_next;
for (vch = ch->in_room->people; vch != NULL; vch = vch_next)
{
vch_next = vch->next;
etc…..

You'll find this in several places in the ROM code. What is does is save the "next" for us in case 'vch' turns out to be null.
BTW, you will find extract_char in raw_kill. :)


It's been said in other places but I thought I'd go ahead and say it again.
This type of code is the single most common problem for data corrupt on these muds.
Specifically this can cause real problems when your list changes from underneath you or your current node is freed from somewhere else in the code. (Like a function you called. Like extra_char) That's the whole reason you have to use a holder variable like vch_next there.

On the RaM project we've solved the problem for C with a slightly higher level solution (which I may release as a stand alone snippet),
but the better answer would be just going to C++ and using a proper container class out of the standard template library.
08 Nov, 2008, Sandi wrote in the 35th comment:
Votes: 0
Sandi said:
CHAR_DATA *vch, *vch_next;
for (vch = ch->in_room->people; vch != NULL; vch = vch_next)
{
vch_next = vch->next;
etc…..

:redface: :redface: :redface:

My bad….. and we just had a discussion of this in a RaM thread, too. Just proving, I guess, how easy it is to make this mistake. No, you don't have to be drunk, Quix, just working out of context will do it. :rolleyes:

That should be "vch_next = vch->next_in_room".
08 Nov, 2008, Runter wrote in the 36th comment:
Votes: 0
Yes, Sandi is bad.
08 Nov, 2008, quixadhal wrote in the 37th comment:
Votes: 0
Burn the witch, Burn her!



I bet the original DikuMUD authors never imagined just how much trouble their license would cause people 15 years later. I just found the GDSL library, which would save Runter a lot of work… if it weren't GPL. :(
08 Nov, 2008, Runter wrote in the 38th comment:
Votes: 0
It's alright. I've got most everything we need done.

The few places where it's clutch to have something like a binary tree we can implement a proper hash table or something for better lookup time on the Ice.

I plan on spending most of my effort on Fire eventually.
08 Nov, 2008, David Haley wrote in the 39th comment:
Votes: 0
Random but related comment:
Hash tables are only as good as their hash function; when you're storing a large number of elements, unless you get a very good hash function, a binary tree might actually be better.
09 Nov, 2008, Runter wrote in the 40th comment:
Votes: 0
They're only as good as their distribution–And muds typically have good distribution which doesn't take much of a hash function to compute a balanced bucket list based on vnum.

Plus they are easy to implement. I'm sure a balanced binary probably is better.

I always think it's funny how people are quick to point out a hash trees worst case scenario. Specifically for rooms by vnum, especially if it's a wilderness map, a hash table is not a bad solution. Converting the x,y to a morton code gets you a nearly flawless distribution.
20.0/92