🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Data type diversity

Started by
51 comments, last by Calin 1 year, 10 months ago

So bools are still present because they have to do with logic rather than math, filling in a true or a false takes more concentration than just simply typing 1 or 0 [real programming off]

My project`s facebook page is “DreamLand Page”

Advertisement

Correct. The big reason is/was logic and compiler-agnostic support.

It was late to join the party, not in C at the time, and was easier to deal with than bit manipulation, or bit fields, or double-not logic (which is still used). For serious use you're still better off using any of these rather than bools.

frob said:
The big reason is/was logic and compiler-agnostic support

using bool or just an int is one of those cases where your option it will make absolutely no difference, performance wise, on home/non professional project but mounts bit by bit in a tremendous stack of wasted resources in a place like a data center.

My project`s facebook page is “DreamLand Page”

Smart, modern compiles should be a lot more efficient with bools, it should stack them 8 to a byte, instead of using a byte for each one - a waste of space which legitimises the OPs point.

When I use large amounts of Bools I have to write the code to manage them myself, which seems strange.

@TheBlackRattie That's why I wrote my packed bool array

TheBlackRattie said:
Smart, modern compiles should be a lot more efficient with bools, it should stack them 8 to a byte

a teaching about modern compilers, excellent, we all need one from time to time.

(that`s rude but it`s snowballing so I have to address it)

My project`s facebook page is “DreamLand Page”

TheBlackRattie said:
Smart, modern compiles should be a lot more efficient with bools, it should stack them 8 to a byte, instead of using a byte for each one - a waste of space which legitimises the OPs point.

Compilers will never be able to optimize the space used by class-members. This is always up to the programmer. The reason is that c++ gives guarantees about the space/layout of a class, where other languages like C# don't.

class Foo
{
public:
	bool a;
	bool b;
};

// some other translation unit or possibly dll

bool& bRef = fooRef.b;
// potentially in a different function call that the compiler can't see
bRef = value;

So how do you expect a compiler to be able to pack class-member bools into a single byte if I'm a) expect to be able to determine the exact byte-offset of each member and b) be able to take references to each member? LTO could eliminate this, but only for types you don't export from a DLL.

If you really want to pack the bools, you could always use bit-fields.

class Foo
{
public:
	bool a : 1;
	bool b : 1;
};

bool& a = foo.a; // error => now you cannot take references, but you've got the space-saving your desire

Modern compilers are pretty good at optimization, but I'm glad that my type-system code for reflecting a classes bool doesn't suddenly fail in release-builds because the compiler decided that the bool whose offsetof I took should now be a bit-field under the hood, really. Having bitfields as an option where you need it should be enough.

TheBlackRattie said:
When I use large amounts of Bools I have to write the code to manage them myself, which seems strange.

No, you don't. The language gives you two options directly and a third that you described.

The language gives you the option to have an array of bool values, the option to specify bits, and the option to do what you described yourself if you know better.

Breaking them down:

In practice C++ does what it needs as described earlier in the thread. The language is to ensure correctness more than anything even if that correctness results in inefficiencies on any given hardware. While it is less efficient for storage space placing them in 8 bits, it ends up being far more efficient processing-wise as there are no bit manipulations needed to get the value. C++ gives you two direct options: You can use the compiler's default encoding that prefers the machine's addressable object size, and you can use the compiler's built-in manipulations for a single bit entry that prefers size instead of runtime costs.

If you want an efficiently packed collection you have two options. In each case are explicitly exchanging both the correctness guarantees and the processing guarantees in exchange for your own hardware-specific improvements.

You must face the difference between the logic and the machine needs. You have two different things. The logic is a single bit coupled with a bunch of automatic guaranteed conversions and promotions. The machine has an addressable unit, typically 8 bits. The two are fundamentally at odds.

For those three options, a bool array, a bit field, and bit manipulations:

If you're looking for performance, storing them as an array of full bytes is almost always better for performance. Each one can be individually addressed without bit manipulation. There is no bit masking which is expensive on some hardware, no bit shifting which is extremely expensive on some hardware, no trickery to ensure the correct bit is isolated, no double-negation that wastes CPU cycles but ensures the correct bits are isolated, etc. Overall the extra padding space gives a huge performance increase in the general case preferring runtime performance over space. Since the general case is a focus on runtime performance, that's a good thing.

With bit fields you're telling the compiler that space is more important than the machine's natural encoding, and the compiler will do all the bit masking, bit shifting, conversions, promotions, and other operations for you. How much work it is depends on the CPU. The x86 architecture has what's called a barrel shifter, which can do those operations rapidly and often in a single CPU cycle. Many other architectures, most historical and some current, do not have them. Never forget that C++ targets everything from the tiniest microcontrollers to the largest supercomputers, and plenty of architectures (especially RISC-based devices) are unlikely to have them. While an operation like 1>>24 or 1<<18 is near instant on a PC and hardware that can rotate right or rotate left directly. But when the hardware doesn't do it directly the shift can require several hundred CPU cycles.

Consider this C++ code:

struct MyBits
{
   unsigned int b0 : 1;
   unsigned int b1 : 1;
   unsigned int b2 : 1;
   ...
   unsigned int b29 : 1;
   unsigned int b30 : 1;
   unsigned int b31 : 1;
};

...
MyBits bits;
bool myBool;
...
myBool = 1;
bits.b14 = 1;
...
if (myBool) { ... }
if (bits.b12) { ... }

If all you want is easy access to for the 32 bits in C++, you can have it. You can let C++ do all the bit manipulations needed, although even with this it is implementation defined how the bits are actually packed.

Note how even though you have access to individual bits, it is still less efficient than single unit even though the CPU has instructions and hardware to shift all bits.

Compare a bool assignment of one versus three:

  mov BYTE PTR myBool$[rsp],1      ; assign the bool value directly


  mov eax, DWORD PTR bits$[rsp]    ; load the bitfield
  bts eax, 14                      ; test and set CPU function
  mov DWORD PTR bits$[rsp], eax    ; store the result

And for the comparison, also three instead of 1:

  movzx eax, BYTE PTR myBool$[rsp]      ; load the full byte bool variable, test is automatic
  
  
  mov eax, DWORD PTR bits$[rsp]         ; load the bitfield
  shr eax, 12                           ; shift down to the desired bit
  and eax, 1                            ; test against the single bit

And that's on hardware that supports it directly!

Setting the bit on a device that doesn't have it in hardware support can take a loop:

; Set a full char bool
  ldi r24,lo8(1)   ; load the value 1
  std Y+2,r24      ; store the value as a variable

; Set an arbitrary bit in a bitfield
  ldi r24,Y+1      ; load the shift count
  mov r18,r24      ; 
  ldi r19,lo8(0)   ; load up the bit we'll be shifting and clear the working area
  ldi r24,lo8(1)   ;
  ldi r25,hi8(1)   ;
  rjmp 2f          ; escape early, we might be working on the lowest bit.
  rol r25          ; rotate a single bit at a time since that's what the hardware supports
  brpl 1b          ; loop until we've shifted the bit into position
  ldd r18,Y+3      ; load the bitfield low order bits
  ldd r19,Y+4      ; load the bitfield high order bits
  or r24,r18       ; or to set the bitfield's low order bits, if the target bit was on that side
  or r25,r19       ; or to set the bitfield's high order bits, if the target bit was on that side
  std Y+4,r25      ; store the bitfield's high order bits
  std Y+3,r24      ; store the bitfield's low order bits

And finally you can do those operations yourself with the operators &, |, ~, ^, << and >>.

So all that written and documented (thanks to godbolt.org!) …

The language DOES give you those options.

If you want a large amount of bools nothing prevents you from using arrays or collections and doing exactly that, an array of bools. The compiler will address the machine side by using the smallest addressable units, and the logical operations of evaluating to true or false and converting to either the values 0 or 1 and no others will all happen automatically. It will automatically do what is correct and safe, even if it happens to use 8x more memory.

You can tell the compiler to use them as bitfield entries. It will still be correct and safe, but comes with reduced guarantees relative to bools and has compiler-specific and architecture-specific runtime costs, while reducing memory. You've exchanged runtime memory with runtime performance.

And you have the option of doing bitwise math to do the operations yourself. It is up to the programmer to verify it is correct and safe. It is entirely your responsibility, but you can guarantee both that encoding uses the minimum number of bits, and take the balancing upon yourself.

You get them all.

Oh, and to further complicate things about how hardware varies, there was a notable story about that.

Not all x86 chips have the hardware support. While nearly all of them since the i386 (from 1985) had them up through the Pentium 3 models, when the Pentium 4 was released in late 2000 the first versions of the CPU did not include barrel shifters. The instructions remained, but internally the CPU on the P4 Willamette chipset would loop with the simple shift register. That means the operation x>>31 would take about 30 times longer than x>>1 as it needed to compare and loop internally. They restored it by the time they got to the P4 Prescott line, but not after a whole lot of complaints in tech media.

I think that in choosing a default option, keeping it as a single addressable unit and taking the full byte is the better general purpose solution. The language provides bit-fields and a custom type in the standard library if you need more, and it's fairly easy to create additional libraries for further needs.

frob I enjoy reading your stories, always did.

My project`s facebook page is “DreamLand Page”

This topic is closed to new replies.

Advertisement