Search code examples
c++filereadfilewritefileboost-dynamic-bitset

Reading the dynamic bitset written data from file cannot read the correct data


So I have a vector which has three numbers. 65, 66, and 67. I am converting these numbers from int to binary and appending them in a string. the string becomes 100000110000101000011 (65, 66, 67 respectively). I am writing this data into a file through dynamic_bitset library. I have BitOperations class which does the reading and writing into file work. When I read the data from file instead of giving the above bits it gives me these 001100010100001000001 bits.

Here is my BitOperations class:

#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <fstream>
#include <streambuf>
#include "Utility.h"
using namespace std;
using namespace boost;

template <typename T>
class BitOperations {
private:
    T data;
    int size;
    dynamic_bitset<unsigned char> Bits;
    string fName;
    int bitSize;

public:
    BitOperations(dynamic_bitset<unsigned char> b){
        Bits = b;
        size = b.size();
    }

    BitOperations(dynamic_bitset<unsigned char> b, string fName){
        Bits = b;
        this->fName = fName;
        size = b.size();
    }

    BitOperations(T data, string fName, int bitSize){
        this->data = data;
        this->fName = fName;
        this->bitSize = bitSize;
    }

    BitOperations(int bitSize, string fName){
        this->bitSize = bitSize;
        this->fName = fName;
    }

    void writeToFile(){
        if (data != ""){
            vector<int> bitTemp = extractIntegersFromBin(data);
            for (int i = 0; i < bitTemp.size(); i++){
                Bits.push_back(bitTemp[i]);
            }
        }
        ofstream output(fName, ios::binary| ios::app);
        ostream_iterator<char> osit(output);
        to_block_range(Bits, osit);
        cout << "File Successfully modified" << endl;
    }

    dynamic_bitset<unsigned char> readFromFile(){
        ifstream input(fName);
        stringstream strStream;
        strStream << input.rdbuf();
        T str = strStream.str();

        dynamic_bitset<unsigned char> b;
        for (int i = 0; i < str.length(); i++){
            for (int j = 0; j < bitSize; ++j){
                bool isSet = str[i] & (1 << j);
                b.push_back(isSet);
            }
        }
        return b;
    }
};

And here is the code which calls theses operations:

#include <iostream>
// #include <string.h>
#include <boost/dynamic_bitset.hpp>
#include "Utility/BitOps.h"

int main(){
    vector<int> v;
    v.push_back(65);
    v.push_back(66);
    v.push_back(67);

    stringstream ss;
    string st;
    for (int i = 0; i < v.size(); i++){
        ss = toBinary(v[i]);
        st += ss.str().c_str();
        cout << i << " )" << st << endl;
    }
    // reverse(st.begin(), st.end());
    cout << "Original: " << st << endl;

    BitOperations<string> b(st, "bits2.bin", 7);
    b.writeToFile();
    BitOperations<string>c(7, "bits2.bin");
    boost::dynamic_bitset<unsigned char> bits;
    bits = c.readFromFile();
    string s;
    
    // for (int i = 0; i < 16; i++){
        to_string(bits, s);
        // reverse(s.begin(), s.end());
    // }
    cout << "Decompressed: " << s << endl;
}

What am I doing wrong which results in incorrect behaviour?

EDIT: Here is the extractIntegersFromBin(string s) function.

vector<int> extractIntegersFromBin(string s){

    char tmp;
    vector<int> nums;

    for (int i = 0; s[i]; i++ ){
        nums.push_back(s[i] - '0');
    }

    return nums;
}

Edit 2: Here is the code for toBinary:

stringstream toBinary(int n){
    vector<int> bin, bin2;
    int i = 0;
    while (n > 0){
        bin.push_back(n % 2);
        n /= 2;
        i++;
    }

    // for (int j = i-1; j >= 0; j--){
    //     bin2.push_back(bin[j]);
    // }
    reverse(bin.begin(), bin.end());
    stringstream s;
    for (int i = 0; i < bin.size(); i++){
        s << bin[i];
    }

    return s;
}

Solution

  • You are facing two different issues:

    1. The boost function to_block_range will pad the output to the internal block size, by appending zeros at the end. In your case, the internal block size is sizeof(unsigned char)*8 == 8. So if the bit sequence you write to the file in writeToFile is not a multiple of 8, additional 0s will be written to make for a multiple of 8. So if you read the bit sequence back in with readFromFile, you have to find some way to remove the padding bits again.

    2. There is no standard way for how to represent a bit sequence (reference). Depending on the scenario, it might be more convenient to represent the bits left-to-right or right-to-left (or some completely different order). For this reason, when you use different code pieces to print the same bit sequence and you want these code pieces to print the same result, you have to make sure that these code pieces agree on how to represent the bit sequence. If one piece of code prints left-to-right and the other right-to-left, you will get different results.

    Let's discuss each issue individually:

    Regarding issue 1

    I understand that you want to define your own block size with the bitSize variable, on top of the internal block size of boost::dynamic_bitset. For example, in your main method, you construct BitOperations<string> c(7, "bits2.bin");. I understand that to mean that you expect the bit seqence stored in the file to have a length that is some multiple of 7.

    If this understanding is correct, you can remove the padding bits that have been inserted by to_block_range by reading the file size and then rounding it down to the nearest multiple of your block size. Though you should note that you currently do not enforce this contract in the BitOperation constructor or in writeToFile (i.e. by ensuring that the data size is a multiple of 7).

    In your readFromFile method, first note that the inner loop incorrectly takes the blockSize into account. So if blockSize is 7, this incorrectly only considers the first 7 bits of each block. Whereas the blocks that were written by to_block_range use the full 8 bit of each 1-byte block, since boost::dynamic_bitset does not know anything about your 7-bit block size. So this makes you miss some bits.

    Here is one example for how to fix your code:

        size_t bitCount = (str.length()*8) / bitSize * bitSize;
        size_t bitsPerByte = 8;
    
        for (int i = 0; i < bitCount; i++) {
          size_t index = (i / bitsPerByte);
          size_t offset = (i % bitsPerByte);
    
          bool isSet = (str[index] & ( 1 << offset));
          b.push_back(isSet);
        }
    

    This example first calculates how many bits should be read in total, by rounding down the file size to the nearest multiple of your block size. It then iterates over the full bytes in the input (i.e. the internal blocks that were written by boost::dynamic_bitset), until the targeted number of bits have been read. The remaining padding bits are discarded.

    An alternative method would be to use boost::from_block_range. This allows you to get rid of some boiler plate code (i.e. reading the input into some string buffer):

      dynamic_bitset<unsigned char> readFromFile() {
        ifstream input{fName};
    
        // Get file size
        input.seekg(0, ios_base::end);
        ssize_t fileSize{input.tellg()};
    
        // TODO Handle error: fileSize < 0
    
        // Reset to beginning of file
        input.clear();
        input.seekg(0);
    
        // Create bitset with desired size
        size_t bitsPerByte = 8;
        size_t bitCount = (fileSize * bitsPerByte) / bitSize * bitSize;
        dynamic_bitset<unsigned char> b{bitCount};
    
        // TODO Handle error: fileSize != b.num_blocks() * b.bits_per_block / bitsPerByte
    
        // Read file into bitset
        std::istream_iterator<char> iter{input};
        boost::from_block_range(iter, {}, b);
    
        return b;
      }
    

    Regarding issue 2

    Once you have solved issue 1, the boost::dynamic_bitset that is written to the file by writeToFile will be the same as the one read by readFromFile. If you print both with the same method, the output will match. However, if you use different methods for printing, and these methods do not agree on the order in which to print the bits, you will get different results.

    For example, in the output of your program you can now see that the "Original:" output is the same as "Decompressed:", except in reverse order:

    Original: 100000110000101000011
    ...
    Decompressed: 110000101000011000001
    

    Again, this does not mean that readFromFile is working incorrectly, only that you are using different ways of printing the bit sequences.

    The output for Original: is obtained by directly printing the 0/1 input string in main from left to right. In writeToFile, this string is then decomposed in the same order with extractIntegersFromBin and each bit is passed to the push_back method of boost::dynamic_bitset. The push_back method appends to the end of the bit sequence, meaning it will interpret each bit you pass as more significant than the previous (reference):

    Effects: Increases the size of the bitset by one, and sets the value of the new most-significant bit to value.

    Therefore, your input string is interpreted such that the first bit in the input string is the least significant bit (i.e. the "first" bit of the sequence), and the last bit of the input string is the most significant bit (i.e. the "last" bit of the sequence).

    Whereas you construct the output for "Decompressed:" with to_string. From the documentation of this method, we can see that the least-significant bit of the bit sequence will be the last bit of the output string (reference):

    Effects: Copies a representation of b into the string s. A character in the string is '1' if the corresponding bit is set, and '0' if it is not. Character position i in the string corresponds to bit position b.size() - 1 - i.

    So the problem is simply that to_string (by design) prints in opposite order compared to the order in which you print the input string manually. So to fix this, you have to reverse one of these, i.e. by printing the input string by iterating over the string in reverse order, or by reversing the output of to_string.