I have a lot of text files, i want to compress them to save disk space, then when I need them, concat them and send to a client. To save CPU cycles, I don't want to decompress/recompress data. Also client should be able to decompress data, and result should be joined/concated.
Here is what I did so far and it's result (code in PHP language):
$str1 = "Hello";
$str2 = "World";
$deflate1 = gzdeflate($str1);
$deflate2 = gzdeflate($str2);
$concat = substr($deflate1, 0, -2) . $deflate2;
$inflate = gzinflate($concat);
print($inflate);
Result:
HellnWorld
As you see, the last character of the first string is changed from o
to n
.
How can I fix this algorithm? (algorithms or c/php/go sample codes are ok).
What you're trying to do with the output of gzdeflate()
won't work. You cannot delete bytes from the end to attempt concatenation. You can simply concatenate gzip streams (without deleting anything), as produced by PHP's gzencode()
, which are then valid gzip data. Though I am told that gzdecode()
has a bug, and will not decode such a sequence of gzip members. (Someone should report the bug.)
You can use deflate_init()
and deflate_add()
, with appropriate flush options, to build up a compatible, complete deflate stream. For each individual stream, you would use ZLIB_SYNC_FLUSH
, followed by ZLIB_FINISH
to make a deflate stream that ends with a an empty, stored deflate block, which ends on a byte boundary, and then an empty fixed block marked as the last block. That final empty fixed block is two bytes. If it is deleted from the end, then you can concatenate another deflate stream after it. The last such piece concatenated should not have the last two bytes deleted, so that the whole thing is a properly terminated deflate stream.
When trying this, I found another bug. deflate_add()
will not honor the ZLIB_SYNC_FLUSH
request if the data string is empty. You can repeatedly call deflate_add($def, '', ZLIB_SYNC_FLUSH);
all day long, and it will do nothing. You need to use ZLIB_SYNC_FLUSH
on your last call of deflate_add()
with some data. For example:
$def = deflate_init(ZLIB_ENCODING_RAW);
$out = deflate_add($def, 'this ', ZLIB_NO_FLUSH);
$out = deflate_add($def, 'is ', ZLIB_NO_FLUSH);
$out .= deflate_add($def, 'a test.', ZLIB_SYNC_FLUSH);
$out .= deflate_add($def, '', ZLIB_FINISH);
(For some reason, I have never answered a question about PHP without discovering more bugs in PHP. I wonder why people use it.)
Edit by OP: Here is what is exactly needed:
// first file
$def1 = deflate_init(ZLIB_ENCODING_RAW);
$file1 = deflate_add($def1, "1111111111", ZLIB_SYNC_FLUSH);
// second file, compress in two part
$def2 = deflate_init(ZLIB_ENCODING_RAW);
$file2 = deflate_add($def2, "22222", ZLIB_NO_FLUSH);
$file2 .= deflate_add($def2, "33333", ZLIB_SYNC_FLUSH);
// join on the fly
// $def3 = deflate_init(ZLIB_ENCODING_RAW);
// $joined = $file1 . $file2;
// $joined .= deflate_add($def3, "", ZLIB_FINISH);
// or simply :
$joined = $file1 . $file2 . "\03\00";
// decompress
$decompress = gzinflate($joined);