On Boost I read about masking information into pointers to save memory (here: https://www.boost.org/doc/libs/1_72_0/doc/html/intrusive/set_multiset.html, optimize_size). How is this possible? I read somewhere that pointers only use 48 Bit, but are 64 Bits long, so you can push your information in the higher bits with bit shifting. Is that correct?
Why are they using an integer to store the color information of rb-trees? Would it not be more efficient to use chars?
On Boost I read about masking information into pointers to save memory (here: https://www.boost.org/doc/libs/1_72_0/doc/html/intrusive/set_multiset.html, optimize_size). How is this possible?
The answer is given at the link you posted:
The hook will embed the color bit of the red-black tree node in the parent pointer if pointer alignment is even.
When alignment is even, the lowest bit in the pointer value is always zero and can be used to store the color bit. Whenever the pointer or the color needs to be loaded, the other bits need to be cleared, which adds a minor performance impact, but in return it saves a pointer worth of space in the hook. In practice though, the performance overhead is often hidden by parallel execution of instructions in the CPU.
I read somewhere that pointers only use 48 Bit, but are 64 Bits long, so you can push your information in the higher bits with bit shifting. Is that correct?
On 64-bit systems the size of the pointer is 64 bits, but some systems do not implement full-width pointers and instead use fewer bits for addressing. When upper bits are unused, they are required to have a specific value (ones in x86). Wikipedia gives a good overview about different CPUs, including x86. You will see that different architectures have different limitations, and some even implement full-width 64-bit addressing. So, while it is possible to store additional data in the upper bits, it is not possible on all architectures, and the clearing logic may be different on different CPU architectures.
Why are they using an integer to store the color information of rb-trees? Would it not be more efficient to use chars?
A char
is an integer. The color information is actually one bit (red or black node). The other bits, whether in a char
or an int
, are unused. To answer your question, using a char
could be more efficient in some cases, but not in the case of Boost.Intrusive hooks. Therefore Boost doesn't use neither of these types to store the color, when optimize_size
is enabled. When it's not enabled, an enum (typically of the same size as int
) is used to store the color tag, but it doesn't really matter since due to alignment a pointer worth of space is added to the hook. (Some of the added padding could be used for other useful data in case of a base hook, but only in very specific cases, when the first non-static data members immediately following the hook in the node binary layout have small alignment.)