22 Jan, 2010, Skol wrote in the 41st comment:
Votes: 0
quixadhal said:
Skol said:
Then it reads those in and stores them simply as ch->act[PLAYER_KILLER], 1 if defined else 0. But it only reads/writes those it needs to, and, you wouldn't be limited like it is with bits as you could define on down the array until you're blue in the face without issues.


Just realize that you'll be wasting a LOT of space that way. Far more than was discussed earlier, because C doesn't have sparse arrays. So, if you have bit 0 set, and bit 672 set, you have to have an array of 673 characters to hold those two results. As long as you're good with that, go for it.

I was thinking vs say RoM's standard bits with A-Z, aa-ff.
IE people went and made 'Act2' with a second set of bits. Rather than that, you could have an array of say 64 or 128, instead of being limited to 32. (A-Z, aa-ff).

Similar to how skills are used.
in get_skill(): skill = ch->pcdata->learned[sn];. In that case it's size is determined by 'MAX_SKILL' (I forget the stock define, but maybe 200 etc).

It just seemed like a much more easily expandable approach, say an array of 64. I guess I'm not seeing where the waste would be? And, if that is the case, wouldn't skills be better done by bits? I'm looking at the logic behind having 32 options, vs expandable as needed.

I do see that the method of writing the info in the players file would be larger with this option, but the intent is human readability (otherwise I'd just go Act 010000100001….01010 etc, vs Act IS_MAGE~, Act AUTO_LOOT~ etc). Yet the original way would be: Act ACFJaa~ etc, which tells me zilch unless I'm looking at my header file.
22 Jan, 2010, David Haley wrote in the 42nd comment:
Votes: 0
Skol said:
It just seemed like a much more easily expandable approach, say an array of 64. I guess I'm not seeing where the waste would be?

Well, if all you need is a bit, and you're using an int, you're wasting some 31 bits. Do you care? That's another question…

Skol said:
And, if that is the case, wouldn't skills be better done by bits?

The skill table isn't storing binary values but percentage learned, so no, bit sets don't really apply.

Skol said:
I'm looking at the logic behind having 32 options, vs expandable as needed.

The logic was most probably not knowing any better or some thought that they needed to squeeze out every last bit even compared to an encapsulated, arbitrary-size bit set (assuming they thought of such a thing).
22 Jan, 2010, KaVir wrote in the 43rd comment:
Votes: 0
David Haley said:
My understanding was that there were performance implications in that accessing an address at a power of 4 is very fast, but if you are off that alignment, some processors need to do funny things to get the relevant byte out.

It depends on the data type (and compiler settings of course). An int or float will typically be 4-byte aligned (for performance reasons, as you mentioned), while a short will be 2-byte aligned and a char 1-byte aligned. The compiler will use padded bytes to align the data types, so for obvious reasons bytes can be aligned at any address.
22 Jan, 2010, David Haley wrote in the 44th comment:
Votes: 0
KaVir said:
It depends on the data type (and compiler settings of course). An int or float will typically be 4-byte aligned (for performance reasons, as you mentioned), while a short will be 2-byte aligned and a char 1-byte aligned. The compiler will use padded bytes to align the data types, so for obvious reasons bytes can be aligned at any address.

But the padded bytes don't have any performance problems because you're not trying to read them. I thought we were talking about the bytes that you actually care about, as opposed to bytes that the compiler sticks in to align other stuff. Perhaps what you're saying is that there are no performance consequences whatsoever for individual byte access (as opposed to word access), irrespective of padding.
22 Jan, 2010, Skol wrote in the 45th comment:
Votes: 0
David Haley said:
Skol said:
It just seemed like a much more easily expandable approach, say an array of 64. I guess I'm not seeing where the waste would be?

Well, if all you need is a bit, and you're using an int, you're wasting some 31 bits. Do you care? That's another question…

Skol said:
And, if that is the case, wouldn't skills be better done by bits?

The skill table isn't storing binary values but percentage learned, so no, bit sets don't really apply.

Skol said:
I'm looking at the logic behind having 32 options, vs expandable as needed.

The logic was most probably not knowing any better or some thought that they needed to squeeze out every last bit even compared to an encapsulated, arbitrary-size bit set (assuming they thought of such a thing).

Great points, thanks David.
How would you have done them differently? I'm not asking to be a pain or smart ass, merely trying to get an insight into other methods that would be more effective in both size and readability.
22 Jan, 2010, David Haley wrote in the 46th comment:
Votes: 0
Skol said:
How would you have done them differently?

Well, for the bit things, I posted a C implementation of a bitvector here:
[link=file]2640[/link]

I consider the C implementation less nice than the C++ implementation I have (which I haven't uploaded). But I would implement all bit vectors using something along those lines. I don't believe that you need them to be resizable at runtime, although that is not true if you have a rich enough scripting environment to add new flags dynamically.
22 Jan, 2010, KaVir wrote in the 47th comment:
Votes: 0
David Haley said:
Perhaps what you're saying is that there are no performance consequences whatsoever for individual byte access (as opposed to word access), irrespective of padding.

I'm simply saying that bytes are 1-byte aligned. An array of 4 unsigned chars will use 4 bytes. Test it yourself if you don't believe me:

#include <iostream>

struct DataStruct
{
char Data[4];
};

int main()
{
std::cout << "Size: " << sizeof(DataStruct) << " bytes" << std::endl;
}

If you want to read about data alignment, you may find this link of interest:

http://www.ibm.com/developerworks/librar...

You'll note that the article points out in its example that accessing memory one byte at a time is slow. However this is because they're accessing variables that are more than one byte in size - perhaps that's what you're thinking of?

In my case I only want to access a single byte at a time. I suppose it could have an impact on saving and loading the character (as you'd need to iterate through each byte), but once the character is in memory there shouldn't be any performance difference.
22 Jan, 2010, David Haley wrote in the 48th comment:
Votes: 0
I'm not talking about size in memory, but performance.
KaVir said:
You'll note that the article points out in its example that accessing memory one byte at a time is slow. However this is because they're accessing variables that are more than one byte in size - perhaps that's what you're thinking of?

I don't see where you're reading this. In fact, their tests (e.g., Listing 1) are using uint8_t pointers, so it's not more than a byte in size. And this is slower than using uint16_t pointers (see Listing 2). In fact, they say this specifically:

The article said:
The first thing you notice is that accessing memory one byte at a time is uniformly slow. The second item of interest is that when accessing memory two bytes at a time, whenever the address is not evenly divisible by two, that 27% speed penalty rears its ugly head.

This is what I'm talking about. It is why I said that you can pay an alignment performance penalty by using unsigned chars (and hence reading a byte at a time) as opposed to a chunk that is of the processor's memory access granularity. By using a chunk of the granularity, a read entails a single read, whereas reading it a byte at a time means four reads, three of which are redundant (assuming that the granularity is four bytes).

I'll grant you that the standard does not technically guarantee that a uint32_t (or uint64_t) will actually be that many bytes, given that bit you quoted. But it's also worth noting that that same quotation says that it's exceedingly unusual. If performance is a concern, it's probably better to not access memory one byte at a time and instead assume (with very high safety) that you're not on that rather wacky machine architecture.
22 Jan, 2010, KaVir wrote in the 49th comment:
Votes: 0
David Haley said:
I'm not talking about size in memory, but performance.

Earlier in the thread you stated "If bools are one byte in size and not aligned, you will run into alignment issues there."

If you add padding bytes between your value bytes, it will increase the memory usage.

David Haley said:
I don't see where you're reading this.

"It took an average of 67,364 microseconds to execute this function. Now modify it to work on two bytes at a time instead of one byte at a time – which will halve the number of memory accesses."

"This function took 48,765 microseconds to process the same ten-megabyte buffer – 38% faster than Munge8. However, that buffer was aligned. If the buffer is unaligned, the time required increases to 66,385 microseconds – about a 27% speed penalty."

So the second test was 38% faster…but it only made half as many memory accesses. If you're only interested in 1 bit, it's the speed of memory access that matters.
22 Jan, 2010, David Haley wrote in the 50th comment:
Votes: 0
KaVir said:
Earlier in the thread you stated "If bools are one byte in size and not aligned, you will run into alignment issues there."

If you add padding bytes between your value bytes, it will increase the memory usage.

As I said in #48, I was talking about performance, not memory usage. Sorry for not making that clearer earlier. I thought we'd all been talking about performance all along, since Tonitrus was talking about speed.

To recontextualize, the initial claim that Elanthis and I objected to was:
Tonitrus said:
You waste a bunch of memory, but who cares, memory is much cheaper than processors, and accessing ints/bools is a lot faster than accessing bitfields, due to memory alignment issues and whatnot.

It seems that you have also suggested that this statement is incorrect, as you have just argued that reading things one byte at a time is faster than 4. You would have to further argue that the bitwise operator penalty is worse than the penalty for reading in the entire int over, say, a single byte.
22 Jan, 2010, Skol wrote in the 51st comment:
Votes: 0
I found an interesting read for me on memory access:(Btw thanks guys, learning is fun ;))
Data alignment: Straighten up and fly ri...

I was totally seeing memory just like they illustrated it, it was like seeing pictures of my thoughts vs what you were illustrating as far as bits. Nice ;).
22 Jan, 2010, Tonitrus wrote in the 52nd comment:
Votes: 0
I'm assuming that someone has misunderstood someone along the way, because trying to follow this all as one conversation keeps making my brain crash. Or possibly I'm stupid, whichever.

At any rate, what I was originally suggesting is to use bools/ints (I was mistakenly under the impression that they are the same) because you can just check the value and run with it without doing any bitwise operations on it. I don't know the speed of the instructions involved (or how the accessing of memory breaks down, instruction-wise), but that's half as many instructions (on our part) to read or write a value. True, it's 32x as much memory, but memory is much easier to expand/replace than processors.

Now, if you used individual bits in structs, it would either get padded out to at least a byte, or it would be kept as individual unaligned bits, I'm not sure which.

Option 1 brings us back to just using ints/bools, and option 2 leaves us with unaligned accesses.

When I looked into all of this before I came to the conclusion that bitvectors and such are largely not worth the trouble, and I may as well just use bools. I may or may not have been wrong at the time, and I suspect plenty of people here know more about this issue than me, but since so many replies have referenced my post, I figured I may as well expand on it a bit.

That said, it's entirely possible that I'm wholly wrong, but I'm not sure that's the case.

If I am wrong and it's not faster to just use bools or ints, I'd still say bitvectors are far too much trouble to go to just to save individual bytes of RAM.

Note: Using bitvectors as a backend for some sort of encapsulated interface as David Haley has mentioned would probably be a good idea in the case that it's not slower, but as a direct interface, bitvectors are stupid and irritating.
22 Jan, 2010, David Haley wrote in the 53rd comment:
Votes: 0
Tonitrus said:
I don't know the speed of the instructions involved (or how the accessing of memory breaks down, instruction-wise), but that's half as many instructions (on our part) to read or write a value.

Half as many instructions that take four times as long each is a worthless "improvement". You cannot possibly ignore the speed of the instructions involved.

Quote
Option 1 brings us back to just using ints/bools, and option 2 leaves us with unaligned accesses.

I'm not sure what options 1 and 2 are, nor am I sure why these are the only options…?
If you're talking about ints/bools vs. bit-size-specified-struct-members, then there are many other options. KaVir's unsigned char arrays are one option. The link I gave earlier in the thread is another option (it also uses an array, but not of uchars).

One nice feature about "bit sets" where you explicitly map (somehow) from field name to field value is that you can do these lookups dynamically. If you have a bunch of bools in a struct, then you need a big if statement (or switch) to go from name to appropriate field. But if you have constants or strings, and a mapping from names to number, then this code is greatly simplified.
22 Jan, 2010, KaVir wrote in the 54th comment:
Votes: 0
David Haley said:
you have just argued that reading things one byte at a time is faster than 4.

No, I've argued that reading 1 byte is faster than reading 4 bytes. If you want to read 4 bytes, then it's faster to read them in one go (assuming the 4-byte data is properly aligned).

I had a play with the code provided on that website, and came up with the following results for 1 MB of data:

kavir@sandbox:~/temp$ ./a.out
8 bit: It took 34631 microseconds (1048576 accesses).
16 bit: It took 30195 microseconds (524288 accesses).
32 bit: It took 28090 microseconds (262144 accesses).
kavir@sandbox:~/temp$ ./a.out
8 bit: It took 34038 microseconds (1048576 accesses).
16 bit: It took 30192 microseconds (524288 accesses).
32 bit: It took 28359 microseconds (262144 accesses).
kavir@sandbox:~/temp$ ./a.out
8 bit: It took 33456 microseconds (1048576 accesses).
16 bit: It took 29032 microseconds (524288 accesses).
32 bit: It took 27439 microseconds (262144 accesses).


As you can see, there's a noticable speed improvement for the larger data types. However when using bitvectors for flags in a mud, we only need to check a single bit each access (except when we're loading or saving the character), therefore we're more interested in how fast it is to access the data than how much data we read in one go.

So this time I ran the tests 262144 times for each data type, and came up with the following:

kavir@sandbox:~/temp$ ./a.out
8 bit: It took 8276 microseconds (262144 accesses).
16 bit: It took 14819 microseconds (262144 accesses).
32 bit: It took 27742 microseconds (262144 accesses).
kavir@sandbox:~/temp$ ./a.out
8 bit: It took 8608 microseconds (262144 accesses).
16 bit: It took 15029 microseconds (262144 accesses).
32 bit: It took 28522 microseconds (262144 accesses).
kavir@sandbox:~/temp$ ./a.out
8 bit: It took 8550 microseconds (262144 accesses).
16 bit: It took 15319 microseconds (262144 accesses).
32 bit: It took 28900 microseconds (262144 accesses).


For general runtime use, the unsigned char certainly seems to have a speed advantage.
22 Jan, 2010, elanthis wrote in the 55th comment:
Votes: 0
David, a lot of the problem from this whole discussion is that some of us are making various assumptions about certain common architectures, and KaVir is bringing up issues that you are almost never ever going to run into in real life (especially with MUDs), but which are very much issues in the core C language because C is meant to run on a very wide variety of oddball platforms that behave nothing like the x86/PPC/ARM chips that are most common. I'll admit, I don't give a shit about any platform outside of the Big Three and pretty much ignore issues they don't have. :)

Quote
True, it's 32x as much memory, but memory is much easier to expand/replace than processors.


And as I said, using more memory makes your code _slower_ and more in need of a faster processor. I am willing to bet that once you have enough objects, your CPU will waste FAR more time accessing that bloated memory than it will doing a few bitwise ops. The idea that you even think a bitwise op needs to be optimized away is pretty solid proof that you (like the vast, vast majority of programmers) have absolutely no freaking clue what you're talking about when it comes to performance. Most schools just don't teach that material and most online resources and most books are still based on information from 10+ years ago that is no longer valid on today's machines, even today's embedded machines. Many of those techniques and "facts" you may have picked up regarding performance are from the era before DRAM, which has a very huge impact on memory performance: http://people.redhat.com/drepper/cpumemo.... It's also from an era where the memory bus speed was close in speed to the CPU, unlike today where a CPU runs at 3ghz and the memory bus runs at 500mhz (the common "high end" embedded CPUs run slower, but have slower memory buses as well, and the gap is still pretty big).

The gist of all that is that wasting memory makes your program slower, not faster.
23 Jan, 2010, Scandum wrote in the 56th comment:
Votes: 0
Using memory in such a way that it's easy to cache is important as well, even worth wasting some memory over, something few people get. Anyone knows how fast the average cpu cache is?

Another thing most programmers don't get is that writing to memory is about as fast as reading from memory.
23 Jan, 2010, David Haley wrote in the 57th comment:
Votes: 0
Scandum said:
Another thing most programmers don't get is that writing to memory is about as fast as reading from memory.

Not necessarily. When you read, you have to sit around until you get back what you want to read. Sometimes, when you write, the hardware caches your writes to some extent and may also do them asynchronously to some extent. And then there's the question of writing to pages in memory and then pages that have been swapped out to disk; disk writes can also be cached (again, to memory, for instance) whereas with a read you must always wait if it's not loaded already.
23 Jan, 2010, elanthis wrote in the 58th comment:
Votes: 0
Scandum said:
Using memory in such a way that it's easy to cache is important as well, even worth wasting some memory over, something few people get.


Mentioned in the PDF I linked. There's a reason it's called "What every programmer should know about memory" :)
07 Aug, 2010, jurdendurden wrote in the 59th comment:
Votes: 0


This link does not lead directly to the pdf… am I missing something?
07 Aug, 2010, bbailey wrote in the 60th comment:
Votes: 0
jurdendurden said:


This link does not lead directly to the pdf… am I missing something?


Yes. It's been months, people (and their sites) can move. http://people.redhat.com/drepper/ now redirects to his new home at http://www.akkadia.org/drepper/index.htm.... The pdf mentioned above is at http://www.akkadia.org/drepper/cpumemory....
40.0/62