Simple ASCII compression- Help minimize system calls

In my last question, nos gave a method of removing the most significant bit from an ASCII character byte, which matches exactly what my professor said when describing the project.

My problem is how to strip the significant bit and pack it into a buffer using read and write commands. Since the write command takes in a length in the number of bytes to write, how do I go deeper to the bit level of the buffer array?

Solution

Probably the simplest way to do it is in chunks of eight bytes. Read in a chunk then compress them to seven bytes using bitwise operators.

Let's call the input data input[0..7] and the output data output[0..6].

So, the first byte of the output data, output[0], consists of the lower 7 bits of input[0] plus the second-most upper bit of input[2]. That works the same for all others:

    Index:    [0]      [1]      [2]      [3]      [4]      [5]      [6]      [7]
    Input:  0aaaaaaa 0bbbbbbb 0ccccccc 0ddddddd 0eeeeeee 0fffffff 0ggggggg 0hhhhhhh
            ///////  //////   and     --->
            ||||||| /|||||     so on  --->
    Output: aaaaaaab bbbbbbcc cccccddd ddddeeee eeefffff ffgggggg ghhhhhhh
    Index:    [0]      [1]      [2]      [3]      [4]      [5]      [6]

You can use operations like:

output[0] = ((input[0] & 0x7f) << 1) | ((input[1] & 0x40) >> 6)
output[1] = ((input[1] & 0x3f) << 2) | ((input[2] & 0x60) >> 5)
:
output[5] = ((input[5] & 0x03) << 6) | ((input[6] & 0x7e) >> 1)
output[6] = ((input[6] & 0x01) << 7) |  (input[7] & 0x7f)

The others should be calculable from those above. If you want to know more about bitwise operators, see here.

Once you've compressed an eight-byte chunk, write out the seven-byte compressed chunk and keep going.

The only slightly tricky bit is at the end where you may not have a full eight bytes. In that case, you will output as many bytes as you input but the final one will be padded with zero bits.

And, on decompression, you do the opposite. Read in chunks of seven bytes, expand using bitwise operators and write out eight bytes. You can also tell which bits are padding at the end based solely on the size of the last section read in.