Search code examples
cembeddedcompressionbit-manipulationbyte

how to convert different length of bits into byte array?


I am currently working on a project where I have to implement Elias Gamma coding for an Embedded system (STM32F405 programming using C). The main priority is efficiency and speed of the code thus I implenmented a lookup table for the Elias Gamma coding which contains bits values and number of bits for each "bit array" (I am using uint32_t for symbols and uint8_t for lengths). However I am having trouble converting these bits to byte array. Below is an example of what I am trying to do.

For three symbols -> 0b0001001 (bits length 7), 0b011 (bits length 3), b000010001 (bits length 9) I have to get 0b0001001011000010001 thus 0x12 0xC2 0x20 as a byte array to transfer the data through a communication channel like USART.

What confuses me is, bit symbols being different length and "append" process that has to be applied to the bits in a platform that uses bytes as smallest data structure.

Is there a simple way to do this?

Edit: For clarity I am trying to append 4096 "bit array" to X number of bytes. X number depends on the total number of bits appended (which is unknown while prossessing).


Solution

  • Solution requires:

    // Returns false if nothing left.
    int get_next_bits( uint8_t *len, uint32_t *bits );
    
    // Sends the first `output_len` bytes pointed by `output`.
    send_buffer( uint16_t output_len, const uint8_t *output );
    
    // Allocated memory or array large enough for output.
    // See below if this can't be guaranteed.
    uint8_t *buf;
    

    Solution:

    uint16_t output_len = 0;
    uint8_t *output = buf;     // Assumed to be large enough. But see below.
    uint8_t available = 0;     // Number of unassigned bits in `*output`.
    uint32_t bits;             // Bits to add to output.
    uint8_t unassigned;        // Number of bits (left) in `bits`.
    
    while ( get_next_byte( &unassigned, &bits ) ) {
       while ( unassigned > 0 ) {
          if ( available == 0 ) {
             // We need another byte of output.
             // `output` already points to this byte.
             ++output_len;
             *output = 0;
             available = 8;
          }
    
          if ( available <= unassigned ) {
             // We get here, for example, when we have
             // 5 bits to insert but only 3 bits available in current output byte.
             // So we shift the unassigned bits by 2 to place into position.
             // And we're left with 0 available bits and 2 unassigned bits.
             // We can't forget to "remove" from `bits` the 3 bits we assigned.
             uint8_t shift = unassigned - available;
             *output |= bits >> shift;
             unassigned = shift;
             bits &= ~( ( (uint32_t)0xFFFFFFFF ) << unassigned );
             available = 0;
             ++output;
          } else {
             // We get here, for example, when we have
             // 3 bits to insert and 5 bits available in current output byte.
             // So we shift the unassigned bits by 2 to place into position.
             // And we're left with 2 available bits and 0 unassigned bits.
             uint8_t shift = available - unassigned;
             *output |= bits << shift;
             unassigned = 0;
             available = shift;
          }
       }
    }
    
    send_buffer( output_len, output );
    

    The available == unassigned case can be handled specially if desired.

    if ( available == unassigned ) {
       // We get here, for example, when we have
       // 5 bits to insert and 5 bits available in current output byte.
       // We're left with 0 available bits and 0 unassigned bits.
       *output |= bits;
       unassigned = 0;
       available = 0;
       ++output;
    }
    

    If you can't guarantee a buffer large enough to hold all the bits, replace

    if ( available == 0 ) {
       // We need another byte of output.
       // `output` already points to this byte.
       *output = 0;
       available = 8;
       ++output_len;
    }
    

    with

    if ( available == 0 ) {
       // We need another byte of output.
       // `output` already points to this byte.
    
       if ( output_len == buf_size ) {
          // Buffer is full, so send and empty it.
          output -= buf_size;
          output_len = 0;
          send_buffer( buf_size, output );
       }
    
       *output = 0;
       available = 8;
       ++output_len;
    }