13 Jul, 2015, Rarva.Riendf wrote in the 1st comment:
Votes: 0
Hi, If anyone could help it would be great:
I have a bug that randomly appears, I know why, but problem is as I still have not find HOW (yeah lack of unit testing is a problem) to reproduce it.
Basically, depending on the connection/deconnection/reconnection of a character, if the is grouped (in a perticular way) at one point, I nullify a pointer before removing it from the "group" linked list. (and probably some other, but it is this one that makes it crash at least)

I know that, Valgrind tells me pointer get dereferenced. But the problem is, as it does not crash right away, I have a hard time finding what logic is missing, so I was wondering, is there a way to know exactly WHEN a list go wrong.
Is there not a way to watch a variable and stop it when it becomes undefined in gdb ? I know I can watch variable and stop it on some conditions, but not on an 'undefined' value..very annoying.

Hoping I just miss something and you could help.
13 Jul, 2015, alteraeon wrote in the 2nd comment:
Votes: 0
There are no easy solutions. You'll have to manually inspect the code and try to hunt down the bogus pointer management.

One thing that might possibly help is to rebuild the entire codebase using the -fsanitize=address flag. With any luck, that will terminate the server closer to where the problems start and give you some stack traces. Use the 'addr2line' utility to convert the hex addresses into file/line numbers.

If you're building on windows, you won't have access to the sanitizer. You shouldn't be using windows for a mud server.
13 Jul, 2015, quixadhal wrote in the 3rd comment:
Votes: 0
What I did, for my own DikuCRUD, was surround quite a few things with if checks, so I could detect NULL pointers and report them outside of crashing in the debugger.

So, where the original code might have had something like if(player->inventory[3] == 2771), assuming no bad code screwed the structure up, I would replace it with if(player && player->inventory && player->inventory[3] == 2771).

It's not perfect, since in this case inventory might not be 4 elements long, and it may be inefficient, since it does that check every time, even when things were done correctly…. but… back in 1995, it was better than having to run inside gdb forever. And, given today's computer speed, inefficiency is really not an issue for text games.
13 Jul, 2015, Rarva.Riendf wrote in the 4th comment:
Votes: 0
@Quix; Yeah I do that as well, (but not to hide bugs) but it will not work for dereferenced pointer. Those either still have a coherent value or something random or…something that will lead to a crash (or even a infinite loop, as the ->next value could lead to a previous pointer)
I changed too many things at once in the memory management to remember or even detect it right away. And the lack of unit testing in this department makes it now quite annoying to catch…

@alteraeon, I will try this flag and see(huhu windows or linux anyway…both have their share of problems…but as I use valgrind, yeah I am only linsux anyway ;p)

Now that I think about it…I will code a method that check every list containing character after I free a char (and set a perticular value that should not be possible in it) just before I free one.
Should be able to detect then the connection/disconnection etc pattern faster if I see the value while checking the lists…(if I do it right away there is a good chance this perticular value is not overwritten right away)

Thanks
13 Jul, 2015, Tyche wrote in the 5th comment:
Votes: 0
alteraeon said:
You shouldn't be using windows for a mud server.

Damn it. I wish somebody would have told me this 20 years ago.
14 Jul, 2015, SlySven wrote in the 6th comment:
Votes: 0
Rarva.Riendf said:
Now that I think about it…I will code a method that check every list containing character after I free a char (and set a perticular value that should not be possible in it) just before I free one.
Well that value for a pointer in 'C' code should be:
"An integer constant expression with the value 0, or such an expression cast to type void *"!
14 Jul, 2015, Rarva.Riendf wrote in the 7th comment:
Votes: 0
@SLySven: I do set my pointer to void*jsute after I free them, thing is that won't help to have a valid value (and 0 is a perfectly fine value in itself in like….most cases anyway) so it won't helpyou detect if the pointer you have point to a freed memory or not.
16 Jul, 2015, SlySven wrote in the 8th comment:
Votes: 0
Well for 'C' code (void *)0 is explicitly NOT "a fine value" as you put it for a pointer to have AND USE and deliberately so it seems. But yeah, something changing the pointer's value unexpectedly is bad.

How about: do you have an "=" rather than a "==" in an "if" type test on that pointer - which would compile fine and be perfectly valid code but would do nasty things to the contents, as you are finding?
17 Jul, 2015, Pymeus wrote in the 9th comment:
Votes: 0
Rarva.Riendf said:
Now that I think about it…I will code a method that check every list containing character after I free a char (and set a perticular value that should not be possible in it) just before I free one.
Should be able to detect then the connection/disconnection etc pattern faster if I see the value while checking the lists…(if I do it right away there is a good chance this perticular value is not overwritten right away)

That's probably on the right track.

If you have a decent test platform, and the bug is reproducible on that test platform, I would convert the check you describe into a function that abort()s the program when the problem is detected. Then sprinkle calls to that function anywhere you remotely suspect the bug might be happening. If on a Dikulike, I'd start with the command interpreter (preferably right after each command completes, yet the command text is still available in a buffer) and a few key spots in the main update loop. This may run quite slowly, but that's part of why you're doing it on a test platform.

If these "breakpoints" are placed well then you should (using your debugger of choice) be able to deduce at a high level what the program was doing right before the abort(). Move the calls to your check-abort function into the section of code you just identified. Repeat, descending into lower- and lower-level code until you isolate the spot where the fault is happening. Keep in mind that it may be happening in more than one place.
17 Jul, 2015, quixadhal wrote in the 10th comment:
Votes: 0
SlySven said:
Well for 'C' code (void *)0 is explicitly NOT "a fine value" as you put it for a pointer to have AND USE and deliberately so it seems. But yeah, something changing the pointer's value unexpectedly is bad.

How about: do you have an "=" rather than a "==" in an "if" type test on that pointer - which would compile fine and be perfectly valid code but would do nasty things to the contents, as you are finding?


With C (and C++), it's far worse than that. :)

While it's possible to write bad code that uses assignment instead of comparison (every expression in C has a value, and thus it's a valid construct to test the result of an assignment)… far more frequenly you'll find the problem is corruption of string data.

C doesn't have a native string type. Instead, it merely uses a pointer to an array of characters. Because an array is just a pointer, there are no proper bounds checks done by the language for you. When you overrun a buffer, either by assigning a value beyond the end via an explicit call:

int a[2];

a[2] = 27; // Invalid…. a[0] and a[1] are the two elements.


or by the more common string overflow:

char *tmp[20];

sprintf(tmp, "Some stuff to print for %s.", "a longer string that you wanted");


one of three things happens… Option 1 is that the variable in question is near the edge of a "page", which is a chunk of space as determined by the compiler and the linker. If your overflow crosses this page boundry, it triggers a "segmentation fault", because your code tried to write outside the segment it had permission to write to. Option 2 is that the variable is adjacent to executable code, in which case your overflow will corrupt your program itself by putting random opcodes into the program space. This will usually cause errors the next time that chunk of code is run. Option 3 is the more insidious version… your variable is adjacemtn to other variables. In that case, the overflow writes out and doesn't cause any direct errors… however, now other variables have been corrupted. In the worst case scenario, your overflow corrupts data in things that get written back to disk, which would make those corruptions persist.
17 Jul, 2015, Rarva.Riendf wrote in the 11th comment:
Votes: 0
@Pymeus: what the code does right before the abort is not the problem, it is always the same :). If it was so simple to fix I would not have asked this perticular question.
The code can crash like a week after the problem has been generated. Not much help to know what was happening just before :)

@Slysen I wish it could be so simple :) nah I perfectly know what happens, it is a code logic problem. at one point character is not removed from a list he is in before freeing it. I just do not call the method that exist to extract the character and nicely when I should. And the reason why I do not call it when I free the char ? Well because it would be pointless in a lot of cases (temporary or local value for internal use only). An easy fix would be to do it then, but it would slow down the mud considerably when I use some commands like wiping all the mobiles at once (it is already slow enough to do it)
17 Jul, 2015, Pymeus wrote in the 12th comment:
Votes: 0
Rarva.Riendf said:
@Pymeus: what the code does right before the abort is not the problem, it is always the same :). If it was so simple to fix I would not have asked this perticular question.
The code can crash like a week after the problem has been generated. Not much help to know what was happening just before :)

Hmm, I was basing it on on your belief that the check could spot the problem early. Invalid input, irrelevant output.
18 Jul, 2015, Rarva.Riendf wrote in the 13th comment:
Votes: 0
Pymeus said:
Hmm, I was basing it on on your belief that the check could spot the problem early. Invalid input, irrelevant output.


Earlier :) but not that early, unfortunalely. Adding code will only help me reduce the character connections analyses, by detecting earlier when it will lead to a crash (at the time code aborts, it could be like 1 minutes after the problem generation or one week depending on the kind of corruption it generated). I wished there was a tool that did detect this without the need to code it myself.
Oh well back to coding…
18 Jul, 2015, SlySven wrote in the 14th comment:
Votes: 0
quixadhal said:
or by the more common string overflow:

char *tmp[20];

sprintf(tmp, "Some stuff to print for %s.", "a longer string that you wanted");
Is it ever safe to use sprintf() - shouldn't one be using snprintf() instead!!!

I recognise all those cases you mention - I had just forgotten how many were the ways that C string handling can bite you in the backside. :redface:

Must confess my recent coding has been in C++ using Qt libraries - but then QStrings/QChars bring in other issues instead (like the handling of non-BMP characters - but that's another unpleasantness entirely)
19 Jul, 2015, quixadhal wrote in the 15th comment:
Votes: 0
The problem with snprintf(), of course, is that you then get a different kind of data corruption.

Let's say you're formatting strings for output to a data source… while snprintf() will protect your from buffer overruns, a data truncation is just as bad in terms of destroying your work. The core issue is that C doesn't have a string data type. It has pointers and raw allocation which is a horrible way to work with text. C++ tries to hide this by offering several different "string" libraries, but they're only good if you *NEVER* have to interact with OS calls, or things from the standard C library. Once you do, you have to convert back and forth between Ye Olde char *'s again.

Here's something to think about. How many man hours do you think have been spent trying to debug and fix various Dikurivative codebases since 2000? How many times over do you think that much manpower could have just rewritten them ALL in various languages like python, ruby, perl, <insert new one here> and thus improved the entire playing field? :)
19 Jul, 2015, Rarva.Riendf wrote in the 16th comment:
Votes: 0
@quix: I am pretty sure I wasted more hours trying to fix the mess that was the codebase I inherited than rewrite the whole thing in a modern langage (pretty much anything else than C…)
Worse: Even if it is mostly totally fixed now, I am still stuck with C….and the old moronic codebase…;
19 Jul, 2015, SlySven wrote in the 17th comment:
Votes: 0
quixadhal said:
Once you do, you have to convert back and forth between Ye Olde char *'s again.
And of course sometimes those are singedsigned chars and sometimes they are unsigned ones - which of course makes things like telnet protocol handling much more interesting - for some Chinese curse values of "interest".
20 Jul, 2015, drifton wrote in the 18th comment:
Votes: 0
I've always had the stupid school of thought that i wouldn't use a library until i understood how it worked ie, could program the equivalent thing myself thus started my quest to write a mudengine 15 years latter i still enjoy writting mud code and i'm about to get my current iterations back up to parody with some of my other attempts, i look at some of my older codebases and it amazes me i was able to have it online and running with out my friends crashing the damn thing every 5 minutes
28 Jul, 2015, Nathan wrote in the 19th comment:
Votes: 0
SlySven said:
quixadhal said:
or by the more common string overflow:

char *tmp[20];

sprintf(tmp, "Some stuff to print for %s.", "a longer string that you wanted");
Is it ever safe to use sprintf() - shouldn't one be using snprintf() instead!!!

I recognise all those cases you mention - I had just forgotten how many were the ways that C string handling can bite you in the backside. :redface:

Must confess my recent coding has been in C++ using Qt libraries - but then QStrings/QChars bring in other issues instead (like the handling of non-BMP characters - but that's another unpleasantness entirely)


I believe in C the point is that it's the user's responsibility to be safe, not that of the code/library. I don't see why you can't just test the size of the string, the size of the spot you're putting it in and also test for the presence of the proper string terminator. You could also log some kind of error if something consistently isn't working (i.e. you've made a poor assumption somewhere).
28 Jul, 2015, quixadhal wrote in the 20th comment:
Votes: 0
Nathan said:
I believe in C the point is that it's the user's responsibility to be safe, not that of the code/library. I don't see why you can't just test the size of the string, the size of the spot you're putting it in and also test for the presence of the proper string terminator. You could also log some kind of error if something consistently isn't working (i.e. you've made a poor assumption somewhere).


Yeah, that's really convenient for writing code that does lots of string manipulation, like a text MUD.

char * func() {
char tmp[256];

bzero(tmp, 256);
if( snprintf( tmp + strlen( tmp ), sizeof( tmp ) - strlen( tmp ) - 1, "Stuff 1 %s", foo1 ) >= sizeof( tmp ) - strlen( tmp ) - 1) {
// string truncated
return( "error");
}
if( snprintf( tmp + strlen( tmp ), sizeof( tmp ) - strlen( tmp ) - 1, " and Stuff 2 %s", bar2 ) >= sizeof( tmp ) - strlen( tmp ) - 1) {
// string truncated
return( "error" );
}
return( tmp );
}


Now, you could make it even worse by dynamically allocating memory instead ( you STILL need some kind of temporary space when formatting text, since you don't know exactly how big the output is ).. and then have memory leaks when the caller doesn't free() the memory properly.

char * func() {
char *tmp;
char *spot;
int would_write;
int allocated;
int offset;
int difference;

allocated = 256;
if( !( tmp = calloc( sizeof( char *), allocated ) ) {
// no memory
return( strdup( "error" ) );
}
for( spot = tmp + strlen( tmp ); would_write = snprintf( spot, allocated - (spot - tmp) - 1, "Stuff 1 %s", foo1 ) >= allocated - (spot - tmp) - 1; ) {
// string truncated
offset = (spot - tmp);
difference = would_write - ( allocated - offset -1 );
if( !( tmp = realloc( tmp, allocated + difference + 1) ) ) {
// no memory
return( strdup( "error" ) );
}
allocated = allocated + difference + 1;
spot = (tmp + offset);
}
for( spot = tmp + strlen( tmp ); would_write = snprintf( spot, allocated - (spot - tmp) - 1, " and Stuff 2 %s", bar2 ) >= allocated - (spot - tmp) - 1; ) {
// string truncated
offset = (spot - tmp);
difference = would_write - ( allocated - offset -1 );
if( !( tmp = realloc( tmp, allocated + difference + 1) ) ) {
// no memory
return( strdup( "error" ) );
}
allocated = allocated + difference + 1;
spot = (tmp + offset);
}
return( tmp );
}


Does that seem reasonable?
0.0/22