Search code examples
c#logicbinaryfilesbit

What is the easiest way to read different length bit fields in a byte array?


I'm trying to read a binary file (a image format) that uses different sized bit fields to refer to different colors, doing so with a dictionary for a color palette. E.g.

Using the following palette:

0   -> #FFFFFF
10  -> #FF0000  
110 -> #FF00DC  
111 -> #FF5A0C

The binary data looks like this

0101101110101100010111

The problem is that when a read this file I get a byte[] and I don't know how to handle these variable length fields with bytes. The main issue is (using the exemple above) when reading byte[0] I get 01011011, with this I'm able to convert part of the data to #FFFFFF, #FF0000, #FF00DC but I am left with 00000011.

So, the question is, how could I concatenate what is left of this byte with the next one so that I could be able to read the full code. E.g.

00000011 + 10101100 = 0000001110101100

And read this normally.

Obs: I'm using c#

Edit: this is a format that I'm developing for lossless image compression


Solution

  • Here is a sample bit reader. This is not too efficient because I am returning the read bits in the lowest bit position, and then shifting to accumulate the next field.

    First, a class that tracks the bit and byte position in a byte[] and returns the next bit.

    public class BitPosition {
        int bytePos = 0;
        int bitPos = 0; // 0 - 7 only
        Byte[] data;
    
        public BitPosition(Byte[] src) => data = src;
    
        static byte[] byteBitMasks = new byte[] { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 };
        public int ReadNextBit() {
            if (bytePos >= data.Length)
                throw new IndexOutOfRangeException("ReadNextBit");
    
            int bit = (data[bytePos] & byteBitMasks[bitPos]) == 0 ? 0 : 1;
            if (++bitPos > 7) {
                bitPos = 0;
                ++bytePos;
            }
            return bit;
        }
        public bool HasMoreData => bytePos < data.Length;
    }
    

    Now, a class to describe each entry in the compressed color palette:

    public class ColorEntry {
        public byte marker;
        public int color;
        public int sizeInBits;
    }
    

    Note: If you need larger markers, you can replace byte with int or uint. If you need up to 64-bit markers, you will need to modify the ColorReader to use uint64.

    And finally, a ColorReader class that reads the colors from a byte[] using a compressed palette and the BitPosition class:

    public class ColorReader {
        BitPosition bp;
    
        public ColorReader(byte[] data) => bp = new BitPosition(data);
    
        static ColorEntry[] palette = new[] {
                    new ColorEntry { marker = 0b0, color = 0xFFFFFF, sizeInBits = 1 },
                    new ColorEntry { marker = 0b10, color = 0xFF0000, sizeInBits = 2 },
                    new ColorEntry { marker = 0b110, color = 0xFF00DC, sizeInBits = 3 },
                    new ColorEntry { marker = 0b111, color = 0xFF5A0C, sizeInBits = 3 },
                };
    
        public IEnumerable<ColorEntry> Colors() {
            while (bp.HasMoreData) {
                int bitsSoFar = 0;
                int numBits = 0;
                do {
                    int nextBit = bp.ReadNextBit();
                    ++numBits;
                    bitsSoFar |= nextBit;
    
                    var nextCE = palette.FirstOrDefault(ce => ce.sizeInBits == numBits && ce.marker == bitsSoFar);
                    if (nextCE != null) {
                        yield return nextCE;
                        break;
                    }
                    else
                        bitsSoFar <<= 1;
                } while (true);
            }
        }
    }
    

    You can use the class like so:

    var data = new byte[] { 0b01011011, 0b10101100, 0b01011100 };
    var cr = new ColorReader(data);
    var ans = cr.Colors().Select(c => c.color).ToList();