28 Sep, 2010, Rudha wrote in the 1st comment:
Votes: 0
I have had a recurring crash problem with descriptions. It seems as if there is some sort of issue related to the buffers. Originally, my fellow elvenblade programmer had run into this when he attempted to first implement some of the fancy scripted descriptions. He was seemingly able to diagnose it: it was because he had [if][elseif][endif] and it appeared that it was choking on the control structure, you needed to include the else. So keeping that in mind going forward, the problem seemed to be accomodated for.

However, today I ran into the same problem trying to train a new builder for the mud, and she had nothing at all in there which was related to the scripts. I did a lot of wrangling with GDB, until I was finally able to track down the problem to the MUD trying to escape \r in the string.

Specific GDB information:

Quote
(gdb) fin
Run till exit from #0 bufferReplace (buf=0x82a4ba8, a=0x80a1db7 "\r", b=0x80a26ae "\\r", all=1) at buffer.c:145
*** glibc detected *** /usr/games/elvenblade/src/NakedMud: double free or corruption (!prev): 0x0829f5c0 ***
======= Backtrace: =========
/lib/libc.so.6[0x75b0d5]
/lib/libc.so.6(cfree+0x96)[0x75d126]
/usr/games/elvenblade/src/NakedMud[0x806b026]
/usr/games/elvenblade/src/NakedMud[0x806b1fa]
/usr/games/elvenblade/src/NakedMud[0x8072cd5]
/usr/games/elvenblade/src/NakedMud[0x8074f91]
/usr/games/elvenblade/src/NakedMud[0x8071399]
/usr/games/elvenblade/src/NakedMud[0x8062eb2]
/usr/games/elvenblade/src/NakedMud[0x80582bd]
/usr/games/elvenblade/src/NakedMud[0x805883d]
/lib/libc.so.6(__libc_start_main+0xe5)[0x7016d5]
/usr/games/elvenblade/src/NakedMud[0x804a2e1]
======= Memory map: ========
00101000-00238000 r-xp 00000000 08:05 47317847 /lib/libcrypto.so.0.9.8g
00238000-0024c000 rwxp 00136000 08:05 47317847 /lib/libcrypto.so.0.9.8g
0024c000-0024f000 rwxp 0024c000 00:00 0
00521000-00646000 r-xp 00000000 08:05 70868 /usr/lib/libpython2.5.so.1.0
00646000-0066c000 rwxp 00124000 08:05 70868 /usr/lib/libpython2.5.so.1.0
0066c000-00672000 rwxp 0066c000 00:00 0
00685000-00692000 r-xp 00000000 08:05 47316995 /lib/libgcc_s-4.3.2-20081105.so.1
00692000-00693000 rwxp 0000c000 08:05 47316995 /lib/libgcc_s-4.3.2-20081105.so.1
006c7000-006e7000 r-xp 00000000 08:05 47317043 /lib/ld-2.9.so
006e7000-006e8000 r-xp 0001f000 08:05 47317043 /lib/ld-2.9.so
006e8000-006e9000 rwxp 00020000 08:05 47317043 /lib/ld-2.9.so
006eb000-0085d000 r-xp 00000000 08:05 47317078 /lib/libc-2.9.so
0085d000-0085f000 r-xp 00172000 08:05 47317078 /lib/libc-2.9.so
0085f000-00860000 rwxp 00174000 08:05 47317078 /lib/libc-2.9.so
00860000-00863000 rwxp 00860000 00:00 0
00865000-00868000 r-xp 00000000 08:05 47317835 /lib/libdl-2.9.so
00868000-00869000 r-xp 00002000 08:05 47317835 /lib/libdl-2.9.so
00869000-0086a000 rwxp 00003000 08:05 47317835 /lib/libdl-2.9.so
0086c000-00882000 r-xp 00000000 08:05 47317165 /lib/libpthread-2.9.so
00882000-00883000 r-xp 00015000 08:05 47317165 /lib/libpthread-2.9.so
00883000-00884000 rwxp 00016000 08:05 47317165 /lib/libpthread-2.9.so
00884000-00886000 rwxp 00884000 00:00 0
00888000-008af000 r-xp 00000000 08:05 47317834 /lib/libm-2.9.so
008af000-008b0000 r-xp 00026000 08:05 47317834 /lib/libm-2.9.so
008b0000-008b1000 rwxp 00027000 08:05 47317834 /lib/libm-2.9.so
008b3000-008cd000 r-xp 00000000 08:05 47317836 /lib/libselinux.so.1
008cd000-008ce000 r-xp 00019000 08:05 47317836 /lib/libselinux.so.1
008ce000-008cf000 rwxp 0001a000 08:05 47317836 /lib/libselinux.so.1
008d1000-008e4000 r-xp 00000000 08:05 47317833 /lib/libz.so.1.2.3
008e4000-008e5000 rwxp 00012000 08:05 47317833 /lib/libz.so.1.2.3
008e7000-008e9000 r-xp 00000000 08:05 47317039 /lib/libutil-2.9.so
008e9000-008ea000 r-xp 00001000 08:05 47317039 /lib/libutil-2.9.so
008ea000-008eb000 rwxp 00002000 08:05 47317039 /lib/libutil-2.9.so
0090a000-00914000 r-xp 00000000 08:05 47318493 /lib/libcrypt-2.9.so
00914000-00915000 r-xp 00009000 08:05 47318493 /lib/libcrypt-2.9.so
00915000-00916000 rwxp 0000a000 08:05 47318493 /lib/libcrypt-2.9.so
00916000-0093d000 rwxp 00916000 00:00 0
0093f000-00941000 r-xp 00000000 08:05 47317846 /lib/libcom_err.so.2.1
00941000-00942000 rwxp 00001000 08:05 47317846 /lib/libcom_err.so.2.1
00969000-0098d000 r-xp 00000000 08:05 76464 /usr/lib/libk5crypto.so.3.1
0098d000-0098e000 rwxp 00024000 08:05 76464 /usr/lib/libk5crypto.so.3.1
00990000-00999000 r-xp 00000000 08:05 76081 /usr/lib/libkrb5support.so.0.1
00999000-0099a000 rwxp 00008000 08:05 76081 /usr/lib/libkrb5support.so.0.1
0099c000-0099e000 r-xp 00000000 08:05 47317845 /lib/libkeyutils-1.2.so
0099e000-0099f000 rwxp 00001000 08:05 47317845 /lib/libkeyutils-1.2.so
009a1000-009ce000 r-xp 00000000 08:05 76467 /usr/lib/libgssapi_krb5.so.2.2
009ce000-009d0000 rwxp 0002d000 08:05 76467 /usr/lib/libgssapi_krb5.so.2.2
009d2000-00a19000 r-xp 00000000 08:05 47317848 /lib/libssl.so.0.9.8g
00a19000-00a1d000 rwxp 00046000 08:05 47317848 /lib/libssl.s
Program received signal SIGABRT, Aborted.
0x007166d6 in raise () from /lib/libc.so.6


So naturally I went through and tried to figure out what's going on here, the first spot I looked of course was in that actual function in ./src/buffer.c, around line 172:

int bufferReplace(BUFFER *buf, const char *a, const char *b, int all) {
// first, check if we'll need to expand the size of the buffer
int a_len = strlen(a), b_len = strlen(b);
int len_needed = buf->maxlen;
int replaced = 0;

if(b_len - a_len > 0) {
int to_replace = (all ?
count_occurences(buf->data, a) :
strstr(buf->data, a) != NULL);

// if we don't have any to replace, do nothing
if(to_replace == 0)
return 0;

// if we won't have enough room, do the expansion
if(to_replace * (b_len - a_len) + buf->len > buf->maxlen)
len_needed = ((buf->maxlen + to_replace*(b_len-a_len))*5)/4 + 20;
}

int buf_i, tmp_i;
char buftmp[len_needed];

// go through and replace all the occurences we need to
for(buf_i = tmp_i = 0; buf->data[buf_i] != '\0';) {
// we found a match
if(!strncmp(buf->data+buf_i, a, a_len)) {
replaced++;
strcpy(buftmp+tmp_i, b);
buf_i += a_len;
tmp_i += b_len;
// exit if we only need to replace the first occurence
if(all == 0) break;
}
else {
buftmp[tmp_i] = buf->data[buf_i];
tmp_i++; buf_i++;
}
}

// continue filling up the tmpbuf (if we only replaced 1, and not all)
for(; buf->data[buf_i] != '\0'; buf_i++, tmp_i++)
buftmp[tmp_i] = buf->data[buf_i];

// make sure the end of string marker is in place
buftmp[tmp_i] = '\0';

// copy over all the changes
if(buf->maxlen >= len_needed) {
strcpy(buf->data, buftmp);
buf->len = tmp_i;
}
else {
free(buf->data);
buf->data = malloc(len_needed);
strcpy(buf->data, buftmp);
buf->maxlen = len_needed;
buf->len = tmp_i;
}

return replaced;
}


I do not, however, see anything that is immediately wrong with that, so after a few hours of spinning my wheels pointlessly, I suppose I'll paint myself a big target on myself (considering earlier this morning in other topics) and see if anyone sees issues with that which I'm missing.

Maya/Rudha

[edit] For the sake of completeness and verbosity, the description content was:

Quote
Large stone shops and brightly decorated kiosks run along the edge of the lane, signs and people showing off their wares and specials. The cobblestone of the lane has been well taken care of despite being well worn from foot traffic.


The existing room prototype resource content:

abstract: no
script :~
### The following rproto was generated by redit.
### If you edit this script, adhere to the stylistic
### conventions laid out by redit, or delete the top line

### string values
me.name = "A shop lined village lane"
me.terrain = "City"
me.desc = me.desc + " " + "It looks unfinished.\r\n"

### begin exit: south
exit = me.dig("south", "village_lane9")
### end exit

### begin exit: north
exit = me.dig("north", "village_lane")
### end exit

### ############################################################################
### Environment proto information begins here.
### —————————————————————————-
me.environment.MorningMsg_Clear = ""
me.environment.AfternoonMsg_Clear = ""
me.environment.EveningMsg_Clear = ""
me.environment.NightMsg_Clear = ""
me.environment.MorningMsg_Rainy = ""
me.environment.AfternoonMsg_Rainy = ""
me.environment.EveningMsg_Rainy = ""
me.environment.NightMsg_Cloudy = ""
me.environment.MorningMsg_Cloudy = ""
me.environment.AfternoonMsg_Cloudy= ""
me.environment.EveningMsg_Cloudy = ""
me.environment.NightMsg_Cloudy = ""
### —————————————————————————-
### Environment proto information ends here.
### ############################################################################
-
28 Sep, 2010, David Haley wrote in the 2nd comment:
Votes: 0
It would be helpful if you had the full gdb output (i.e. compile with debugging enabled) and pointed to exactly which line was causing the problem.

Also, this line strikes me as weird:
len_needed = ((buf->maxlen + to_replace*(b_len-a_len))*5)/4 + 20;

What's up with multiplying by 5 just to divide by 4 and add 20?
28 Sep, 2010, Rudha wrote in the 3rd comment:
Votes: 0
That, as far as I know, is from the stock NakedMud release; I have only edited the python modules in ./lib/pymodules

To be honest, I would know the exact line if I hadn't been impatient, I had basically been stepping through the thing for … about an hour in the same function until I said okay screw it I'm going to go to the next function and BAM that was the one. I could probably go over it; that was definetely a 'there has to be a better way to do this' moment though.

Maya/Rudha
29 Sep, 2010, chrisd wrote in the 4th comment:
Votes: 0
Rudha said:
Originally, my fellow elvenblade programmer had run into this when he attempted to first implement some of the fancy scripted descriptions. He was seemingly able to diagnose it: it was because he had [if][elseif][endif] and it appeared that it was choking on the control structure, you needed to include the else.


I remember this one. I think I asked at the time what version on NM Elvenblade is based on, but it appears that the bufferReplace() function you pasted is identical to whatever's in v3.8.1.

I seem to remember someone having a similar issue (descriptions crashing the MUD, GDB indicated that '\r' escaping was to blame) and I thought it might be a funky line ending issue but it turned out that his client was sending UTF-8 encoded text. NM doesn't play well with UTF-8 (yet), so you might want to check that out. It's a long shot, I guess, but maybe worth a try?

EDIT: Nevermind, apparently I forgot that UTF-8 wasn't the issue in that case. -_-
EDIT2: It's NUL bytes that NM can't handle (yet). Is that helpful? I don't know.
29 Sep, 2010, chrisd wrote in the 5th comment:
Votes: 0
Change
if(to_replace * (b_len - a_len) + buf->len > buf->maxlen)
len_needed = ((buf->maxlen + to_replace*(b_len-a_len))*5)/4 + 20;
}


to

if(to_replace * (b_len - a_len) + buf->len >= buf->maxlen)
len_needed = ((buf->maxlen + to_replace*(b_len-a_len))*5)/4 + 20;
}


EDIT: That's lines 126-128 in stock buffer.c
EDIT2: In response to David's comment on the weird code, I have no idea why hollis choise (5x/4)+20, but he's consistent with that throughout the code (there are three other places) and (from what I can see) the idea is to avoid constantly having to expand the buffer by small amounts.
29 Sep, 2010, hollis wrote in the 6th comment:
Votes: 0
I forget what this problem was, but I remember it was bizarre. I am fairly sure I fixed it. Maybe it was forgetting to put \0 at the end of bufferFormat and then bufferReplace played bad with it later. Here's a copy of my buffer.c

http://nakedmud.pastebin.com/qjpFHJh5
29 Sep, 2010, chrisd wrote in the 7th comment:
Votes: 0
Just tried hollis' buffer.c and was able to reproduce the bug. My above fix still works, so here's a fixed version of hollis' fixed version of buffer.c: http://nakedmud.pastebin.com/Q6DFe0pN

Stendec suggested a different fix, which I incorporated into the above, which was to change line 113 of stock buffer.c (line 132 of hollis' version) from:
int len_needed = buf->maxlen;

to:
int len_needed = buf->maxlen + 1;

Not being a C guy, I'm unsure of which fix is better. The change to >= buf->maxlen is consistent with checks in the rest of the file (see lines 52, 177 and 353 in stock buffer.c), so if nothing else it brings that particular check in line with the rest of the code.

EDIT: Switched to C++ syntax hightlighting, then back to C when it made no difference.
29 Sep, 2010, Dean wrote in the 8th comment:
Votes: 0
I hadn't intended do any building for a while, so I wouldn't have stumbled across this for some time yet, so thanks Rudha for bringing this to light and Chrisd for the fix (I opted for door no.1).
29 Sep, 2010, Rudha wrote in the 9th comment:
Votes: 0
This kind of thing is why I try to keep my own changes to the py modules, it makes a clear division as to what Ive edited and what I have not.

Ill try the fixes when I get hime and let you know if they work for me or not.

Elvenblade is based off 3.8, incidentally.

Maya/Rudha
29 Sep, 2010, Runter wrote in the 10th comment:
Votes: 0
Off-by-one logic errors like these are always good places to look first.
29 Sep, 2010, chrisd wrote in the 11th comment:
Votes: 0
Rudha said:
I have had a recurring crash problem with descriptions. It seems as if there is some sort of issue related to the buffers. Originally, my fellow elvenblade programmer had run into this when he attempted to first implement some of the fancy scripted descriptions. He was seemingly able to diagnose it: it was because he had [if][elseif][endif] and it appeared that it was choking on the control structure, you needed to include the else. So keeping that in mind going forward, the problem seemed to be accomodated for.

I just confirmed that adding [else] only solves the problem in that it changes the length of the description. The original problem description is fine with the above fix in place.

Dean said:
I hadn't intended do any building for a while, so I wouldn't have stumbled across this for some time yet, so thanks Rudha for bringing this to light and Chrisd for the fix (I opted for door no.1).

This is one of those delightful bugs that you may never have encountered. It appears that hollis never saw it personally, and he's running a production MUD. It's only a problem with names/descriptions/other buffers of particular lengths, but it always a problem when they're that length. Fun.
30 Sep, 2010, Rudha wrote in the 12th comment:
Votes: 0
It would have popped up sooner or later, Im just glad it was sooner so that it could get dealt with sooner as opposed to later.

Maya/Rudha
30 Sep, 2010, thaolen wrote in the 13th comment:
Votes: 0
I had this happen over and over again till I deleted a room, and trace backs happen with triggers that aren't coded properly, but I thought that I could add my two cents to let you know I discovered something like this to.

There was a room that every time I would just save with redit the mud would kick me but not crash. Then I would keep trying to save once I logged in again. After that the mud would crash and I would get something on the lines of what Rudha got, the traceback that shows up in term.
30 Sep, 2010, Rudha wrote in the 14th comment:
Votes: 0
The fix posted above seems to work, though I haven't tested it extensively yet.

Maya/Rudha
0.0/14