13 Nov, 2009, Silenus wrote in the 1st comment:
Votes: 0
I decided to hack away a little at my driver project starting with the memory management. I have been curious as to how one goes about measuring how good an allocator (in my case this might include a garbage collector) is. I am not particularly familiar with performance testing and tuning. Is the basic idea to use some sort of trace or simulated load and print out some sort of log of what the allocator is doing?
13 Nov, 2009, David Haley wrote in the 2nd comment:
Votes: 0
You'd want to profile it using standard tools for the language you're writing the VM in – or, if you don't have any, write basic profiling yourself – and then measure how much time is being spent allocating and deallocating memory, in addition to running the garbage collector. Useful additional statistics can be things like how fragmented your memory pool is, because typically fragmentation leads to reduced performance.
13 Nov, 2009, Silenus wrote in the 3rd comment:
Votes: 0
Thanks David. Are there useful summary statistics for fragmentation? Or do I have to construct some sort of "picture" to get a sense of how much fragmentation is occuring. Also I am curious why certain systems instead of using malloc() go straight to sbrk(). It would seem to me if you allocate a huge chunk of ram using malloc the overhead would be minimal.
13 Nov, 2009, David Haley wrote in the 4th comment:
Votes: 0
I'm not sure why you wouldn't have to construct some sort of "picture"… what did you have in mind?
You should realize that this is not really a common thing for people to do, so I wouldn't expect a full toolchain available that helps you do this kind of stuff.

As for sbrk vs. malloc, it's a question of control. If you're implementing your own memory management, you probably just want a big chunk and you deal with cutting it up yourself. Usually if you're caring about this stuff in the first place, you care about every little problem, so you want full control – the whole point is to avoid overhead.
13 Nov, 2009, Scandum wrote in the 5th comment:
Votes: 0
Silenus said:
I decided to hack away a little at my driver project starting with the memory management. I have been curious as to how one goes about measuring how good an allocator (in my case this might include a garbage collector) is. I am not particularly familiar with performance testing and tuning. Is the basic idea to use some sort of trace or simulated load and print out some sort of log of what the allocator is doing?

That's indeed the basic idea. The best investment is probably to create a cpu monitor that measures various activities. It could look something like this:
Section                           Time (usec)    Freq (msec)  %Prog         %CPU
Area Save Time: 63 1201405 0.00 0.00
Update Shops: 335 600703 0.03 0.00
Update Time: 2408 60027 1.85 0.00
Update Areas: 162 60229 0.12 0.00
Update Weather: 131 60110 0.10 0.00
Update Objects: 1108 60038 0.85 0.00
Char Save Time: 1674 1802109 0.04 0.00
Update Violence: 987 4004 11.38 0.02
Update Obj Progs: 50 4004 0.58 0.00
Update Mob Progs: 1758 4004 20.27 0.04
Update Mobiles: 661 2002 15.25 0.03
Update Aggressive: 322 500 29.71 0.06
Update Purger: 1 100 0.46 0.00
Update Tactical: 2 100 0.92 0.00
Scan Descriptor: 7 100 3.23 0.01
Process Input: 6 100 2.77 0.01
Process Output: 2 100 0.92 0.00

Unknown CPU Usage: 0.02 percent
Average CPU Usage: 0.22 percent
13 Nov, 2009, quixadhal wrote in the 6th comment:
Votes: 0
You'll also want to be aware of how the underlying OS handles memory, since that's something you have to work with. For example, it's pretty common for a unix-like OS to use paging, and those page sizes will affect how your own blocks of memory are treated. If a given page is 4K, and you allocate blocks in sizes of 8K, the OS may choose to swap half of your block out to disk so it can buffer file I/O. You can't really do anything about that, but at least you know that a 6K block would have the same (or worse) performance as an 8K block, and be better than a 12K block.

When doing benchmarks, that does mean you'll want to disable swap (ideally), or at least try to ensure that you aren't doing much of anything else on that hardware so you don't get swapped out during the test.
13 Nov, 2009, David Haley wrote in the 7th comment:
Votes: 0
Given that dealing with swap is a very real fact of life, I'm not sure that disabling swap would give you a realistic idea of how your system would work. If it works wonderfully in a world where nothing is swapped, but fails miserably if stuff starts swapping, then arguably you have created a nice ivory tower but not something that will work in the real world.
13 Nov, 2009, kiasyn wrote in the 8th comment:
Votes: 0
David Haley said:
Given that dealing with swap is a very real fact of life, I'm not sure that disabling swap would give you a realistic idea of how your system would work. If it works wonderfully in a world where nothing is swapped, but fails miserably if stuff starts swapping, then arguably you have created a nice ivory tower but not something that will work in the real world.


unless your real world system requirements were lotsa ram! :D
13 Nov, 2009, Tyche wrote in the 9th comment:
Votes: 0
Silenus said:
Also I am curious why certain systems instead of using malloc() go straight to sbrk(). It would seem to me if you allocate a huge chunk of ram using malloc the overhead would be minimal.


A general memory management system is not likely to allocate a huge chunk of memory off the bat as it would be quite wasteful in many cases. One reason to use sbrk() is that allocations are both likely to be contiguous and/or the bounds of fragments can be resolved to contiguous areas. You cannot control that with malloc(). Also security considerations might prompt you to use mmap() because you want different permissions on the memory allocated. For example, you might wish to allocate memory that has executable permission for systems that generate runtime dynamic executable code. AFAIK sbrk() has different permissions on different types of Unixes.
14 Nov, 2009, quixadhal wrote in the 10th comment:
Votes: 0
David Haley said:
Given that dealing with swap is a very real fact of life, I'm not sure that disabling swap would give you a realistic idea of how your system would work. If it works wonderfully in a world where nothing is swapped, but fails miserably if stuff starts swapping, then arguably you have created a nice ivory tower but not something that will work in the real world.


True enough, but my point is that you can't change the way the OS behaves. If you're trying to optimize your allocation algorithms, you may want to do that before adding things into the equation that are unpredictable…. such as swap (which may well change from one kernel to the next, see linux 2.2.x), or multiple users doing random things.
14 Nov, 2009, Silenus wrote in the 11th comment:
Votes: 0
Thanks for the replies. W.r.t. the picture comment, what I meant was some sort of graph to visualize the fragmentation if it could not be reduced meaningfully into a couple summary statistics.
14 Nov, 2009, David Haley wrote in the 12th comment:
Votes: 0
I'm sure there are summary statistics you can use, such as average size of contiguity-breaking chunk, relative size of allocated chunks vs. relative size of unallocated but contiguity-breaking chunks (as opposed to simply unused memory) – this gives you a measure of overhead or wastefulness of some sort –, and probably a few others. I would suggest poking around on Google to find papers (not necessarily formal academic ones) or other documents on this kind of project. I know they're out there as I've come across them in the past. Effective C++ has a piece on memory allocators although I haven't read it in long enough to remember if it discusses fragmentation in detail.
0.0/12