19 Jul, 2015, MayaRK wrote in the 1st comment:
Votes: 0
I'm losing my mind over a memory bug. I've been trying to track it down for days now but it continues to crash my test port…

So, alloc_mem() is crashing when it finds an invalid memory address in its list of recycled memory, but all my calls to alloc_mem() and free_mem() are AFAIK correct. I made sure that strings being freed by free_string() (which calls free_mem()) were originally allocated with str_dup() (which calls alloc_mem()). I'm not mixing alloc_mem() with any other deallocation functions. So why is there an invalid memory address in rgFreeList? :cry:

There are a few calls to malloc() and free() in the code – from what I understand, it is fine to use malloc()/realloc()/calloc() and free() alongside ROM's memory mgmt functions as long as they don't get mixed (malloc() with free_mem(), for example). Am I wrong?

I've had this issue before and was told that it's likely I freed something with free_mem(), that something got added to ROM's recycled list of memory, and somewhere the memory block actually got freed for real, so the address in rgFreeList is bad. It makes sense, but like I said I'm not mixing ROM's alloc functions with C's dealloc functions, so I'm lost…

Has anyone had this problem with ROM? What can I do to figure out what is happening before I go bald? Thank you.
19 Jul, 2015, Tyche wrote in the 2nd comment:
Votes: 0
Your understanding of the possible problems is correct.
In addition to mixing calls, calling free_mem twice on the same object can also cause problems.
20 Jul, 2015, Pymeus wrote in the 3rd comment:
Votes: 0
In the stock code, alloc_mem() and free_mem() are the only two places that rgFreeList is touched. Assuming you haven't changed that, then either some bad memory managed to get past free_mem()'s sanity checks, or something else smashed memory being used for rgFreeList. I would tend to suspect the latter, so I'd be reviewing recent code changes for proper bounds checking on buffers and proper nullifying (or other reassignment) of pointers to memory that's being free_mem()'d.

free_mem()'s main sanity check is that it expects memory to be prepended with an int containing the value of
#define MAGIC_NUM 52571214

Depending on how you use your debugger, this default MAGIC_NUM could potentially appear as the string "N,\"\003" (little endian) or "\003\",N" (big endian). If that header is missing, free_mem() logs "Attempt to recyle invalid memory of size %d." and bails out without changing anything, usually resulting in a harmless memory leak. Harmless, at least, in comparison to data corruption.
20 Jul, 2015, MayaRK wrote in the 4th comment:
Votes: 0
Thank you Tyche for your help and Pymeus for the debugging tips. Pymeus, could you please give an example of what smashing the memory in rgFreeList would look like? Would that mean that memory got added to the recycled list but somewhere it's still being used - or that it actually got freed via free()?

Does calling free_mem() twice on an object really cause problems other than what Pymeus said about causing small leaks? If so, what kind of problems does it cause? I tried freeing a string twice but it returned with the error message about attempting to recycle a bad size and didn't corrupt the rgFreeList. I don't know if it's necessary since it seems free_mem() already checks if the memory coming in should be put on the recycled list or not, but, I had an idea for how to tweak free_mem(). If anyone could offer feedback on this, I'd be grateful. I added the 3rd parameter, the first NULL check, and a line at the bottom to nullify the original pointer, so if for some reason free_mem() is called again on the original object it will just return out of the function. Is this a bad idea?

void free_mem(void *pMem, int sMem, void **ppMem) {
int iList;
int *magic;

if (pMem == NULL || *ppMem == NULL) {
// Some error message here
return;
}

pMem -= sizeof(*magic);
magic = (int *)pMem;

if (*magic != MAGIC_NUM) {
dbbug("Attempt to recyle invalid memory of size %d.", sMem);
dbbug((char *)pMem + sizeof(*magic), 0);
return;
}

*magic = 0;
sMem += sizeof(*magic);
for (iList = 0; iList < MAX_MEM_LIST; iList++) {
if (sMem <= rgSizeList[iList])
break;
}

if (iList == MAX_MEM_LIST) {
void *array[10];
size_t size;

dbbug("Free_mem: size %d too large.", sMem);
size = backtrace(array, 10);
backtrace_symbols_fd(array, size, STDERR_FILENO);
abort();
}

*((void **)pMem) = rgFreeList[iList];
rgFreeList[iList] = pMem;
*ppMem = NULL; // Added this
return;
}


Thank you. :)
21 Jul, 2015, Pymeus wrote in the 5th comment:
Votes: 0
Most of the C language is capable of corrupting memory in some fashion when used incorrectly, so without your code, speculating about what it "might" look like is probably a futile exercise. free()ing a part of the free list could cause it, overrunning a buffer that's on the heap could cause it, using a pointer to an object after it's been recycled could cause it, or even using an uninitialized pointer that happens to be pointed at part of the free list.

Your best lead is what you've changed recently. You say you have a test port, so you probably also have source code for the production copy of the game somewhere? Have you diff'd your test port's source code against the known-good copy?
21 Jul, 2015, Tyche wrote in the 6th comment:
Votes: 0
Is this a bad idea?

I think so. The problem isn't in alloc_mem or free_mem.
You need to find and fix the actual problem.

That free_mem could be more robust in checking the free list to see if the area of memory is already on the free list doesn't fix the problem.
The way you are doing it by modifying the pointer gives me the shivers since it alters the function's purpose with side-effects.
21 Jul, 2015, MayaRK wrote in the 7th comment:
Votes: 0
By changing free_mem(), my hope /was/ to find the actual problem. I was hoping that by nulling the original pointer it would either cause an exception if the pointer was being used elsewhere, or it would send an error message in free_mem() so I could at least see if something was being freed twice. I just thought that would have narrowed something down for me had it worked, but it wasn't necessary since the test port isn't crashing anymore. I ended up comparing new changes on the test port to the main port's source but strangely couldn't find mistakes there, so I went through every instance of str_dup(), alloc_mem(), alloc_perm(), free_mem() and fixed a few things. I'm going to lay off the reboots and copyovers to see if it stays stable the next few days.

Thank you all for the help, and thanks Pymeus - those are exactly the kinds of things I wanted to keep in mind while hunting the problem down.

Just noticed you have a website with some interesting content, Tyche, I'm going to check it out :)
21 Jul, 2015, MayaRK wrote in the 8th comment:
Votes: 0
Nope - still broken. At this point I'm looking for someone who is experienced with C and ROM functions and is willing to take a glance at the code I've written. It's a few hundred lines. It would at least tell me if the bug is in this code or somewhere else. I'd be very grateful. Please PM me or email at erionmud@gmail.com if you'd be willing to look it over. Thank you.
22 Jul, 2015, Kaz wrote in the 9th comment:
Votes: 0
How about this:

Comment out the contents of alloc_mem and free_mem for now and forward them to malloc and free. Then, compile using Clang with its address sanitizer. When running, this will tell you the moment you access out-of-bounds memory, free something twice, etc., etc.
22 Jul, 2015, MayaRK wrote in the 10th comment:
Votes: 0
I've never used Clang. I'll give that a shot. Thanks!
23 Jul, 2015, Pymeus wrote in the 11th comment:
Votes: 0
Kaz said:
How about this:

Comment out the contents of alloc_mem and free_mem for now and forward them to malloc and free. Then, compile using Clang with its address sanitizer. When running, this will tell you the moment you access out-of-bounds memory, free something twice, etc., etc.

Nice! I think you just changed the way I write C code. I was about to add that I might finally have been convinced to switch compilers, but recent gcc versions (4.8+) seem to have similar functionality.
23 Jul, 2015, drifton wrote in the 12th comment:
Votes: 0
error messages that clang spit out are 10x more useful most of the time then even the gcc 5x branches though the newer gcc stuff is much better then it used to be
23 Jul, 2015, Rarva.Riendf wrote in the 13th comment:
Votes: 0
Clang and gcc with -Wall -Wextra -Wcast-align -Wpointer-arith -Winline -Wundef -Wshadow -Wwrite-strings -Wno-unused-parameter -Wfloat-equal

oh…and Wall is definitely not …all….

helps to assure you that the code actually does the thing it looks like it should do.
Does not prevent you from coding stupid logics though, unfortunaltely.

But Valgrind helps detect most of those as well.

The real way to code good stuff ? unit test for all cases…..
23 Jul, 2015, alteraeon wrote in the 14th comment:
Votes: 0
Alter Aeon's default developer mode compiler flags:

-ggdb -fsanitize=address -ansi -pthread -Wall -W -Wextra -pedantic -Waddress -Wfloat-equal -Wundef -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Wstrict-aliasing -Wuninitialized -Wsign-promo -Wdisabled-optimization -Wsynth -Wredundant-decls -Wwrite-strings -Wparentheses -fstack-protector -Wchar-subscripts -Wcomment -Wformat=2 -Wformat-security -Wformat-y2k -Wimport -Winit-self -Winvalid-pch -Wmissing-braces -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-include-dirs -Wpacked -Wreturn-type -Wsequence-point -Wsign-compare -Woverloaded-virtual -Wswitch -Wswitch-enum -Wtrigraphs -Wunknown-pragmas -Wunused -Wunused-function -Wunused-label -Wunused-value -Wunused-variable -Wvariadic-macros -Wvolatile-register-var -std=c++0x -Wdouble-promotion -Wlogical-op -Wmissing-noreturn -Wunreachable-code -Wuseless-cast -Wno-long-long -Wno-unused-parameter -Wno-suggest-attribute=noreturn

Alter Aeon MUD
http://www.alteraeon.com
25 Jul, 2015, MayaRK wrote in the 15th comment:
Votes: 0
I ended up finding the bug last night (without Clang, which I would still like to use) by replacing Diku's alloc_mem and alloc_perm with malloc, its str_dup with strdup, and free_mem with free and then testing my new code. An object was being double freed and it crashed right away. Thank you Kaz for suggesting to use Clang and replacing the contents of alloc_* and free_mem. I wouldn't have thought to do this and now my sanity is returning. :)
27 Jul, 2015, Rarva.Riendf wrote in the 16th comment:
Votes: 0
@altereon:

btw I checked a few thing recently and some of your flag are just redundant: -W and -Wextra -Wall and some flag like -Wadress etc, you may want to clean some :)
I also point that valgrind and -fsanitize=address maynot work together (have not found any info on that but balgrind will refuse to work on code compiled with it on my fedora 21
28 Jul, 2015, Kaz wrote in the 17th comment:
Votes: 0
Rarva.Riendf said:
I also point that valgrind and -fsanitize=address maynot work together (have not found any info on that but balgrind will refuse to work on code compiled with it on my fedora 21


Yes, this is an issue since both the sanitizers and Valgrind use a similar technique (shadow tables), and they interfere with each other. What you should do is have your continuous integration server (Travis, Jenkins, etc.) run separate jobs for each and exercise your tests independently in each mode.

You do do that, right? ;)
28 Jul, 2015, Rarva.Riendf wrote in the 18th comment:
Votes: 0
I just recompile depending on wich mode I want to use, I have many makefiles anyway (gprof also interfere with some stuff). Only takes a few seconds to compile so it is not a problem.
29 Jul, 2015, alteraeon wrote in the 19th comment:
Votes: 0
Rarva - the flags get updated every year or so, with new ones looked at as they become available. I've been using a lot of flags for over a decade, so it's not surprising that some of them have become part of -Wall in the intervening time. That list would be a good start point for most codebases to shoot for. (Minus the c++ stuff if you're not using it.)
29 Jul, 2015, Rarva.Riendf wrote in the 20th comment:
Votes: 0
alteraeon said:
Rarva - the flags get updated every year or so, with new ones looked at as they become available. I've been using a lot of flags for over a decade, so it's not surprising that some of them have become part of -Wall in the intervening time. That list would be a good start point for most codebases to shoot for. (Minus the c++ stuff if you're not using it.)


Yeah a lot have been included in -Wall and -Wextra

I looked because when I compiled with them (I removed pedantic though heh) it did not detect much more than the one I used.
I also compile and test in both C and C++ (because I do not use any specific C++ stuff yet, but may move on eventually)
And also in 32 and 64 bit.
This alone deteted some bugs as well.

When you do that the bug that are left are mostly logic ones. Not necesserily the easiest to solve though :)
0.0/20