Search code examples
c++compressionpoco-librariesdeflateinflate

Deflation compression algorithm for huge data streams


I've got C++ program that is getting data buffer from time to time, and should add it to existing compressed file.

I tried to make POC by reading 1k chunks from some file, passing them to compressed stream and uncompress it when the data is over.

I use Poco::DeflatingOutputStream to compress each chunk to the file, and Poco::InflatingOutputStream to check that after decompressing I get the original file.

However, it seems that after decompressing the stream my data went almost identical to the original file, except that between every 2 consecutive chunks of data i get a few garbage characters such as : à¿_ÿ

here's an example of line that is split between 2 chunks. the original line looks like that :

elevated=0 path=/System/Library/CoreServices/Dock.app/Contents/MacOS/Dock exist

while the decompressed line is :

elevated=0 path=/System/Libr à¿_ÿary/CoreServices/Dock.app/Contents/MacOS/Dock exist

May 19 19:12:51 PANMMUZNG8WNREM kernel[0]: pid=904 uid=1873876126 sbit=0

any idea what am i doing wrong. Here's my POC code:

int zip_unzip() {  
   std::ostringstream stream1;
   Poco::DeflatingOutputStream gzipper(stream1, Poco::DeflatingStreamBuf::STREAM_ZLIB);

   std::ifstream bigFile("/tmp/in.log");
   constexpr size_t bufferSize = 1024;
   char buffer[bufferSize];
   while (bigFile) {
       bigFile.read(buffer, bufferSize);
       gzipper << buffer;
   }
   gzipper.close();

   std::string zipped_string = stream1.str();
   ////////////////// 
   std::ofstream stream2("/tmp/out.log", std::ios::binary);
   Poco::InflatingOutputStream gunzipper(stream2, InflatingStreamBuf::STREAM_ZLIB);
   gunzipper << zipped_string;
   gunzipper.close();
   return 0;
}

Solution

  • Ok, i just realized i used the '<<' operator on each read from the HugeFile (the original decompressed file) without care, since there was no null termination symbol '/0' at the end of each window i read from the file.

    That's the fixed version :

    #include <stdio.h>
    #include <fstream>
    #include <Poco/DeflatingStream.h>
    #include <Poco/Exception.h>
    #include <iostream>
    
    
    int BetterZip()
    {
        try {
        // Create gzip file.
        std::ofstream output_file("/tmp/out.gz", std::ios::binary);
        Poco::DeflatingOutputStream output_stream(output_file, Poco::DeflatingStreamBuf::STREAM_GZIP);
    
        // INPUT
        std::ifstream big_file("/tmp/hugeFile");
        constexpr size_t ReadBufferSize = 1024;
        char buffer[ReadBufferSize];
        while (big_file) {
            big_file.read(buffer, ReadBufferSize);
            output_stream.write(buffer, big_file.gcount());
        }
    
        output_stream.close();
        } catch (const Poco::Exception& ex) {
            std::cout << "Error :  (error code " << ex.code() << " ("  << ex.displayText() << ")";
            return EINVAL;
        }
    
        return 0;
    }