java binary bit-manipulation byte off-by-one

First Byte's Bit Off-By-One after Steganography

Currently working on a Steganography project where, given a message in bytes and the number of bits to modify per byte, hide a message in an arbitrary byte array.

In the first decoded byte of the resulting message, the value has it's first (leftmost) bit set to '1' instead of '0'. For example, when using message "Foo".getBytes() and maxBits = 1 the result is "Æoo", not "Foo" (0b01000110 gets changed to 0b11000110). With message "Æoo".getBytes() and maxBits = 1 result is "Æoo", meaning the bit is not getting flipped as far as I can tell.

Only certain values of maxBits for certain message bytes cause this error, for example "Foo" encounters this problem at maxBits equal to 1, 5, and 6, whereas "Test" encounters this problem at maxBits equal to 1, 3, and 5. Only the resulting first character ends up with its first bit set, and this problem only occurs at the specified values of this.maxBits related to the initial data.

Why, for certain values of maxBits, is the first bit of the resulting decoded message always 1?
Why do different inputs have different values for maxBits that work fine, and others that do not?
What is the pattern with the value of maxBits and the resulting erroneous results in relation to the original data?

Encode and Decode Methods:

public byte[] encodeMessage(byte[] data, byte[] message) {
    byte[] encoded = data;
    boolean[] messageBits = byteArrToBoolArr(message);
    int index = 0;
    for (int x = 0; x < messageBits.length; x++) {
        encoded[index] = messageBits[x] ? setBit(encoded[index], x % this.maxBits) : unsetBit(encoded[index], x % this.maxBits);
        if (x % this.maxBits == 0 && x != 0)
            index++;
    }
    return encoded;
}

public byte[] decodeMessage(byte[] data) {
    boolean[] messageBits = new boolean[data.length * this.maxBits];
    int index = 0;
    for (int x = 0; x < messageBits.length; x++) {
        messageBits[x] = getBit(data[index], x % this.maxBits);
        if (x % this.maxBits == 0 && x != 0)
            index++;
    }
    return boolArrToByteArr(messageBits);
}

Unset, Set, and Get Methods:

public byte unsetBit(byte data, int pos) {
    return (byte) (data & ~((1 << pos)));
}

public byte setBit(byte data, int pos) {
    return (byte) (data | ((1 << pos)));
}

public boolean getBit(byte data, int pos) {
    return ((data >>> pos) & 0x01) == 1;
}

Conversion Methods:

public boolean[] byteArrToBoolArr(byte[] b) {
    boolean bool[] = new boolean[b.length * 8];
    for (int x = 0; x < bool.length; x++) {
        bool[x] = false;
        if ((b[x / 8] & (1 << (7 - (x % 8)))) > 0)
            bool[x] = true;
    }
    return bool;
}

public byte[] boolArrToByteArr(boolean[] bool) {
    byte[] b = new byte[bool.length / 8];
    for (int x = 0; x < b.length; x++) {
        for (int y = 0; y < 8; y++) {
            if (bool[x * 8 + y]) {
                b[x] |= (128 >>> y);
            }
        }
    }
    return b;
}

Sample Code and Output:

    test("Foo", 1);//Æoo
    test("Foo", 2);//Foo
    test("Foo", 3);//Foo
    test("Foo", 4);//Foo
    test("Foo", 5);//Æoo
    test("Foo", 6);//Æoo
    test("Foo", 7);//Foo
    test("Foo", 8);//Foo

    test("Test", 1);//Ôest
    test("Test", 2);//Test
    test("Test", 3);//Ôest
    test("Test", 4);//Test
    test("Test", 5);//Ôest
    test("Test", 6);//Test
    test("Test", 7);//Test
    test("Test", 8);//Test

    private static void test(String s, int x) {
        BinaryModifier bm = null;
        try {
            bm = new BinaryModifier(x);//Takes maxBits as constructor param
        } catch (BinaryException e) {
            e.printStackTrace();
        }
        System.out.println(new String(bm.decodeMessage(bm.encodeMessage(new byte[1024], s.getBytes()))));
        return;
    }

Solution

Your logic of incrementing index has two flaws, which overwrite the first bit of the first letter. Obviously, the bug is expressed when the overwriting bit is different to the first bit.

if (x % this.maxBits == 0 && x != 0)
    index++;

The first problem has to do with embedding only one bit per byte, i.e. maxBits = 1. After you have embedded the very first bit and reached the above conditional, x is still 0, since it will be incremented at the end of the loop. You should be incrementing index at this point, but x != 0 prevents you from doing so. Therefore, the second bit will also be embedded in the first byte, effectively overwriting the first bit. Since this logic also exists in the decode method, you read the first two bits from the first byte.

More specifically, if you embed a 00 or 11, it will be fine. But a 01 will be read as 11 and a 10 will be read as 00, i.e., whatever value is the second bit. If the first letter has an ascii code less or equal than 63 (00xxxxxx), or greater or equal than 192 (11xxxxxx), it will come out fine. For example:

# -> # : 00100011 (35) -> 00100011 (35)
F -> Æ : 01000110 (70) -> 11000110 (198)

The second problem has to do with the x % this.maxBits == 0 part. Consider the case where we embed 3 bits per byte. After the 3rd bit, when we reach the conditional we still have x = 2, so the modulo operation will return false. After we have embedded a 4th bit, we do have x = 3 and we're allowed to move on to the next byte. However, this extra 4th bit will be written at the 0th position of the first byte, since x % this.maxBits will be 3 % 3. So again, we have a bit overwriting our very first bit. However, after the first cycle the modulo operation will correctly write only 3 bits per byte, so the rest of our message will be unaffected.

Consider the binary for "F", which is 01000110. By embedding N bits per byte, we effectively embed the following groups in the first few bytes.

1 bit  01 0 0 0 1 1 0
2 bits 010 00 11 0x
3 bits 0100 011 0xx
4 bits 01000 110x
5 bits 010001 10xxxx
6 bits 0100011 0xxxxx
7 bits 01000110
8 bits 01000110x

As you can see, for groups of 5 and 6 bits, the last bit of the first group is 1, which will overwrite our initial 0 bit. For all other cases the overwrite doesn't affect anything. Note that for 8 bits, we end up using the first bit of the second letter. If that happened to have an ascii code greater or equal than 128, it would again overwrite the firstmost 0 bit.

To address all problems, use either

for (int x = 0; x < messageBits.length; x++) {
    // code in the between
    if ((x + 1) % this.maxBits == 0)
        index++;
}

for (int x = 0; x < messageBits.length; ) {
    // code in the between
    x++;
    if (x % this.maxBits == 0)
        index++;
}

Your code has another potential problem which hasn't been expressed. If your data array has a size of 1024, but you only embed 3 letters, you will affect only the first few bytes, depending on the value of maxBits. However, for the extraction, you define your array to have a size of data.length * this.maxBits. So you end up reading bits from all of the bytes of the data array. This is currently no problem, because your array is populated by 0s, which are converted to empty strings. However, if your array had actual numbers, you'd end up reading a lot of garbage past the point of your embedded data.

There are two general ways of addressing this. You either

append a unique sequence of bits at the end of your message (marker), such that when you encounter that sequence you terminate the extraction, e.g. eight 0s, or
you add a few bits before embedding your actual data (header), which will tell you how to extract your data, e.g., how many bytes and how many bits per byte to read.