Search code examples
c++filefile-iohexdump

Separate data in a text file


I have a big chunk of data (hexdump) which includes thousands of small images and the structure of the data is something like this.

20 00 20 00 00 10 00 00 <data> 20 00 20 00 00 10 00 00 <data> ...

Where the (20 00 20 00 00 10 00 00) is the separation between each section of data (image).

The file myfile including the whole hexdump looks something like this

3C 63 9E FF 38 5F 9E FF
31 59 91 FF 20 00 20 00
00 10 00 00 55 73 A2 FF
38 5D 9C FF 3A 5E 95 FF

What I want to do is basically separate it. I want to take the part which is separated by 20 00 20 00 00 10 00 00 and put each part in a txt file as 1.txt, 2.txt ... n.txt

I tried reading by line but it causes some problems because the 20 00 .. part can be found in 2 lines at some occasions like in the example above so it won't find every occurence.

while (getline(myfile,line,'\n')){
    if (line == "20 00 20 00 00 10 00 00")
        ...
}

Solution

  • Definitely save the file in binary and dump actual hex bytes, as opposed to text form. You'll save 3x more space and the implementation to read files is easier to write.

    That being said, if your file is in binary, this is the solution:

    #include <fstream>  
    
    using std::ifstream;
    using std::ofstream;
    using std::string;
    
    void incrementFilename(char* filename) {
      int iFile;
      sscanf(filename, "%d.dat", &iFile);
      sprintf(filename, "%d.dat", ++iFile);
    }
    
    int main() {
      char outputFilename[16] = "1.dat";
      ifstream input("myfile.dat", ifstream::binary);
      ofstream output(outputFilename, ofstream::binary);
    
      while (!input.eof() || !input.is_open()) {
        char readbyte;
        input.read(&readbyte, 1);
    
        if (readbyte == 0x20) {
          char remaining[7];
          char testcase[7] = { 0x00, 0x20, 0x00, 0x00, 0x10, 0x00, 0x00 };
          input.read(remaining, 7);
          if (strncmp(remaining, testcase, 7) == 0) {
            incrementFilename(outputFilename);
            output.close();
            output.open(outputFilename, ofstream::binary);
          } else {
            output.write(&readbyte, 1);
            output.write(remaining, 7);
          }
        } else {
          output.write(&readbyte, 1);
        }
      }
    
      return 0;
    }