This depends on the operating system, I would imagine, and whether or not it will allocate sub-processes of one process across several physical processors (as opposed to allocating sub-processes over the cores of a single processor).
I use -j4 on a single dual-core CPU setup. I suspect that you didn't see the second CPU being used because the load was not enough to actually use it; the computer was probably spending time blocked in I/O for some of the makes while compiling some of the others. Bumping it up to -j8 would force it to be compiling more things at a time. Even there, note that CPU1 is only being used sporadically.
If I remember correctly, the recommendation is to set -jx, where x = 2*num_cores.
I can't remember to be honest. This site just says "a few more jobs than cores"; maybe since I have two cores I mentally filed that away as 4 jobs hence x cores -> 2x jobs. The idea is to specify enough jobs in parallel so that while some are blocking on disk I/O, others can compile. Maybe 8 jobs is too much for 4 cores, and 6 or 7 could be more appropriate? Well, anyhow, the difference is likely to be quite small.