Search code examples
rustbitvector

In bitvec, how does storage size affect loading?


How does the storage size affect how loading works?.

As an example, in the below snippet, a 22 bit vector is separated into two groups of 11 bits. Each of these groups has the same pattern, storing the decimal number 1027.

    let vec22 = bitvec![u8 /* usize */, Msb0; 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, /* next */ 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]; // 1027, 1027
    let chunks = vec22.chunks(11);

    let numbers = chunks
        .into_iter()
        .map(|chunk| {
            let number: usize = chunk.load();
            println!("{} {}", chunk, number);
            number
        })
        .collect::<Vec<_>>();

When the storage size is usize, the load method returns the expected 1027. When set to u8, I can't seem to find a relationship between the printed numbers and the bit patterns,

[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1] 896  // expected 1027
[1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1] 112  // expected 1027

How does changing the storage size affect these numbers?


Solution

  • There is some fundamental knowledge on endianess and bitvec that's necessary to be able to answer the question,

    • All computers agree that earlier elements in an array have lower addresses (i.e., "come first"), but bytes within the elements may be arranged in any order
    • In bitvec, T is used to describe the underlying element type storing the bits (u8, i16, u32, usize, etc).
    • When loading a bit pattern that spans over multiple T elements into a new variable, bitvec needs to be explicitly told how to interpret the significance of bits across Ts. If earlier Ts are more significant, use load_be. If earlier Ts are less significant, use load_le.
    • The load function applies either the *be or *le loader that matches the arch of the machine it's running on, thus not portable, and which may or may not match how the bit patters have been set up.

    Finally, to answer OP question as to why switching between u8 and usize was producing different results: because uisng u8 caused the 11-bit chunk to be spread across 2 bytes, thus making bitvec choose how to interpret the relative significance of the bits across those bytes. It just so happens that load, which uses the endianness of the machine it's running on, didn't match the intended endianness with which vec22 is defined.