25 Apr, 2009, Grimble wrote in the 1st comment:
Votes: 0
A little build trick I hadn't run across before, with surprising results… It cut my full-build time from about 5 minutes to about 45 seconds, which was far better than I got with pre-compiled headers. The savings come from having one translation unit rather than dozens.

Simply have your makefile generate a source file that #include's all your other source (not header) files, and compile just this one file. As an example makefile target (substitute your own variables as required)…

TEMP = $(patsubst %, '\#include "%"', $(SRC_FILES))

full:
rm -f $(SRC_DIR)/full.c
for i in $(TEMP); do echo $$i >> $(SRC_DIR)/full.c; done
$(CC) $(INCS) $(CFLAGS) -o $(OBJ_DIR)/full.o -c $(SRC_DIR)/full.c #compile
$(CC) $(CFLAGS) $(LFLAGS) -o $(EXE) $(OBJ_DIR)/full.o $(LIBS) #link
25 Apr, 2009, quixadhal wrote in the 2nd comment:
Votes: 0
Heh, I've seen that done before. It's generally considered bad form, as it makes debugging pretty difficult, since all symbols now point to things in "full.c".

Besides, how often do you really need to do a full build? If you're working on stuff, you probably only want to recompile the objects you're modifying, and then do a full build when you're done.
25 Apr, 2009, Grimble wrote in the 3rd comment:
Votes: 0
Yup. Not saying you would always want to do it this way, but there are certainly cases where you could benefit from it.

Another surprising result is that the executable is 30% smaller (at least in my case). I currently have no performance benchmarks for the MUD server, but it's not unreasonable to assume there's at least some performance improvement from that code size reduction.

Edit: On the debugging issue… I suppose you could 'cat' the source file contents into a single file rather than #include them. Then debug would work just fine. You would still have to map any symbol back to the original source file, but you would typically know that just through code familiarity.
25 Apr, 2009, David Haley wrote in the 4th comment:
Votes: 0
Grimble said:
I currently have no performance benchmarks for the MUD server, but it's not unreasonable to assume there's at least some performance improvement from that code size reduction.

Err. I would quantify this before putting too much stock into it. Well, unless you define "some" as some negligible value, of course.
25 Apr, 2009, Grimble wrote in the 5th comment:
Votes: 0
David Haley said:
Err. I would quantify this before putting too much stock into it. Well, unless you define "some" as some negligible value, of course.

Like I said, I have no benchmarks. There is only an anecdotal data point, specific to my server, in that loading the world used to take about 80ms and now takes about 15ms. Other than instantiating objects, I'm not sure how else one would "stress" a MUD server. Any suggestions?

BTW, this is all with gcc 4.3.2. Maybe someone with a background in compiler design would have some thoughts on why a single translation unit yields smaller code size (and possibly performance).
25 Apr, 2009, elanthis wrote in the 6th comment:
Votes: 0
quixadhal said:
Heh, I've seen that done before. It's generally considered bad form, as it makes debugging pretty difficult, since all symbols now point to things in "full.c".


Really? With modern compilers and debuggers it should point to the included files just fine. If I put an inline function in a header the debugger will point to the proper header for symbol source location lookup, at least, and I see no reason why it should differ for including a file with a .c/.cc extension. Unresolved symbols with the linker is the only thing that should get confusing at all, and I'm not sure those are frequent enough to be a real concern.

So far as getting the compiling speed increase, precompiled headers will help a lot without using a single source file. The biggest reason that multiple files take so long to compile is that the headers are reparsed for every source file.

Grimble said:
BTW, this is all with gcc 4.3.2. Maybe someone with a background in compiler design would have some thoughts on why a single translation unit yields smaller code size (and possibly performance).


Strip the executables and compare size. It may largely be debugging information, especially if you actually are seeing debug source location issues with your executable.

The likely cause for this difference other than debugging info is going to be symbol tables and potential inlining/optimization. The compiler can inline functions only when they are defined in the same translation unit, so putting all functions together in one source file allows far more aggressive inlining. There are other optimizations that can be made for functions in the same translation unit as well other than inling, though I'm not sure if GCC supports any of those. Compilers that include good LTO (link-time optimization) get similar advantages without needing to put everything in one source file. Future versions of GCC (possibly by 4.5) are expected to have some basic LTO, and LLVM-based compilers (including llvm-gcc and clang) have very powerful LTO support.

That extra inlining also allows for reduced symbol tables which – depending on the linker options – can affect the final executable. You can shrink the symbol tables pretty nicely by telling GCC to make all symbols private by default and then only expose the symbols you know you need (which, in a server/application, is generally nothing outside of the defaults unless you have some kind of plugin loading facility), which is better than the single source file approach.

If you're mostly just interested in compilation speed and your computer was made sometime recent, just pass -j to make. That will tell it to spawn multiple threads (the default varies – it seems to pick the optimal number automatically for me on Fedora 11/rawhide on my quad-core) which really boosts compilation times a lot. Combine that with precompiled headers and I was able to take the Source MUD compilation time from approx. 90 seconds down all the way to 18 seconds, including the source preprocessing I do for the command source files. I'm sure if I timed it again with my new SATA II hard drive it would be even quicker… yup, averages 15 seconds now. And that's a heavily-templated C++ codebase, so compilation is very heavy process.
25 Apr, 2009, David Haley wrote in the 7th comment:
Votes: 0
I use the -j flag as well and compilation whirs on by. (I pick -j4, since I have a dual-core processor.) I don't use precompiled headers either; turns out that I don't really need to. (I'm usually pretty good with only including what I need to, and avoiding monster includes like mud.h which includes a huge chunk of /usr/include. Precompiled headers would make being careful almost unnecessary though.)

Grimble said:
There is only an anecdotal data point, specific to my server, in that loading the world used to take about 80ms and now takes about 15ms.

To make the anecdotal evidence more interesting, this should probably be run through a profiler to see where time was spent before and where it's no longer spent. At least that would help identify what the compiler has done differently.
25 Apr, 2009, Grimble wrote in the 8th comment:
Votes: 0
Out of curiosity, I stripped all symbols and measured average startup time (i.e., everything up to entering the main loop)…

multiple translation units:
compile = 4m:52s
size = 3239469
stripped = 1978368
startup = 243ms

single translation unit:
compile = 0m:41s
size = 2322600
stripped = 1368576
startup = 200ms

I don't quite follow the explanation of inlining… Doesn't inlining, by definition, make code size larger (with the benefit of being faster)? I seem to be getting both smaller and faster.

If I get really curious, I may break out a profiler and see what's really happening. For now, it would be interesting to see if other codebases have a similar experience with this, since it's easy enough to test.
25 Apr, 2009, David Haley wrote in the 9th comment:
Votes: 0
Inlining makes for larger code to the extent that the function is used in several places, hence copied. But if a function is only called in one place, inlining doesn't add cost. Don't forget that inlining also saves the instructions of creating call stacks, pushing things onto them, returning, cleaning up the stack, etc. Depending on what exactly the function is actually doing, it might actually save instructions to inline the function.
25 Apr, 2009, Grimble wrote in the 10th comment:
Votes: 0
Makes sense. With a single translation unit, the compiler is in the position to apply inlining (and presumably other optimizations) to greater effect.

Seems worth doing full builds this way, at least in the case of the production/live version.
25 Apr, 2009, David Haley wrote in the 11th comment:
Votes: 0
Well, in principle at least, a compiler that does link-time optimization will be able to do the same thing. Putting everything into one file helps only as an artifact of how gcc's optimizer happens to work at the moment. To me, the extra headache in debugging the result isn't really worth the relatively small gains you get. I don't care much at all about code size, and we're not talking about speed improvements that are counted in orders of magnitude. If I'm doing something involving a very tight loop where performance is really critical, I can work around that otherwise.

Still, it is interesting to observe this. I'm a little surprised that it's taken this long to get link-time optimizations, frankly, since you can do a lot of the stuff you already do when compiling a single file pretty straightforwardly.
26 Apr, 2009, Grimble wrote in the 12th comment:
Votes: 0
David Haley said:
Still, it is interesting to observe this. I'm a little surprised that it's taken this long to get link-time optimizations, frankly, since you can do a lot of the stuff you already do when compiling a single file pretty straightforwardly.

There must be a good reason (or two, or three…). The GNU folks aren't goofing off over there.
26 Apr, 2009, David Haley wrote in the 13th comment:
Votes: 0
I would imagine it's just a question of priorities, really. It's also possible that the infrastructure simply makes it very difficult, and fixing the infrastructure to allow it would be a very considerable undertaking. Or maybe just nobody has cared so far.
26 Apr, 2009, elanthis wrote in the 14th comment:
Votes: 0
GCC is a very ancient, cruft compiler. It is architecturally very similar to how C compilers were put together 20 years ago. Thinks like LTO require really big changes to the way the compiler does things, and that's just hard to push down the pipes. This in large part is why I'm so interested in LLVM.

So far as the inling making things smaller, the point is that the inlined function gets removed from the resulting executable entirely. In most cases when a function is inlined a non-inlined copy is kept around as well because the compiler doesn't know if code in another translation unit is going to invoke the function. So you get both any potential size increase from the inlining plus the whole function itself plus the function's symbol table entries.

It is possible there's something else going on as well, but if so, I don't have any guesses at all what it is.
0.0/14