6.2.6.2 Integer types
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine
uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign
bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed
int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value
20 of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then
that padding bit is visible to the users program. The C committee was told that there is a
machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not
know of any machines with user-accessible parity bits within an integer. Therefore, the
25 committee is not aware of any machines that treat parity bits as padding bits.
6.2.6.2 Integer types
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine
uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign
bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed
int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value
20 of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then
that padding bit is visible to the users program. The C committee was told that there is a
machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not
know of any machines with user-accessible parity bits within an integer. Therefore, the
25 committee is not aware of any machines that treat parity bits as padding bits.
I'm not sure how "large" you mean by the "largest" of files, but at work we store tables with some 25k lines and… oh… 50 columns?? in both text and Python-binary-pickle format. (The text for guaranteed compatibility, the binary for speed.) Loading the data structure from the binary pickle takes about two seconds whereas loading it from the text pickle takes a good fifteen. (The files are about 2MB at high bz2 compression.)
The file format itself is dead simple, it's literally just space-separated tabular values.
Of course I'm agreeing with your general point, which is that it is very often not worth using some weird-ass binary format, but this seems like a case with obvious gains. Especially as you start loading up 50 of these things, you really don't want to sit around for 15*50=750s=12.5 minutes. It so happens that in this case, we can take advantage of an extraordinarily well-known Python data structure pickling method, so we don't even have to roll our own.
100% agreed. It's almost always better to express things via an API than direct operations.