How do you folks figure out how some data is compressed?
I'm trying to take apart a binary file. I see the structure in it, and have found where some data segments are.
The UNIX 'file' command just says they are data. The "Signsrch signature file" by Luigi Auriemma didn't match any of the blocks.
The file extension is ".dz". The file starts with "Dr*Z" and the data headers start with "zFED". Google searches didn't turn up any infomation on those. The data blocks have no other structure that I see - no patterns, readable strings, etc.
(There is a DZ file format, but it is proprietary, from 2000-2005, for compressing Quake game files. I haven't yet been able to run "dzip.exe" on this Macbook.)
Here is the format of the data headers:
char[4] "zFED" or "DEFz" flipped
int32 full_size size of uncompressed data, little-endian
int32 cmpr_size number of bytes of data here, L.E.
byte[] data ...
There might be more fields in the header than this. This is how each of the four data blocks start (hex) ...
EC 7D 79 5C 54 47 ...
EC BD 7B 7C 94 C5 ...
C4 BC 07 78 23 57 ...
EC BD 7B 5F 13 D9 ...
So there could be some flags or format fields still there.
Here is the start of each data block:
BLOCK 1
000000d0 7a 46 45 44 80 dc 01 00 14 0c 01 00 ec 7d ..zFED.\......l}
tag--------- full_size--- cmpr_size--- [data ...]
000000e0 79 5c 54 47 b6 70 dd db b7 ef 6d 9a a5 1b 50 41 y\TG6p][7om.%.PA
000000f0 41 68 f7 85 08 a8 d1 68 dc 5a c3 24 0d 0a c4 24 Ahw..(Qh\ZC$..D$
00000100 6f 92 7c f3 32 71 66 92 4c 76 27 33 ef 7d 73 65 o.|s2qf.Lv'3o}se
BLOCK2
00010ce0 7a 46 45 44 00 1e 03 00 be 5f u_.)..zFED....>_
tag--------- full_size--- cmpr_size-
00010cf0 01 00 ec bd 7b 7c 94 c5 d5 38 fe 3c 7b 7b 72 db ..l={|.EU8~<{{r[
--size [data ...
00010d00 6c 76 37 77 2e 49 08 57 23 09 57 41 f0 12 08 e0 lv7w.I.W#.WAp..`
00010d10 26 84 8b 97 da 56 5a b5 b5 6a d5 b6 78 ab ae 37 &...ZVZ55jU6x+.7
00010d20 b2 88 12 ad d6 2e 77 d4 56 df d6 b6 62 6d 5f 37 2..-V.wTV_V6bm_7
BLOCK 3
00026ca0 7a 46 45 44 46 81 01 00 e4 ec 00 00 h,..zFEDF...dl..
tag-------- full_size-- cmpr_size--
00026cb0 c4 bc 07 78 23 57 76 26 0a 80 24 48 80 48 44 22 D<.x#Wv&..$H.HD"
[data ...
00026cc0 08 80 48 24 41 30 81 39 b3 99 73 0e 60 68 66 b2 ..H$A0.93.s.`hf2
00026cd0 99 9b a9 d9 cc cd 50 e4 a8 31 54 ad 68 ef d8 e3 ..)YLMPd(1T-hoXc
BLOCK 4
00035980 7a 46 45 44 60 f8 13 00 ..<\s6k?zFED`x..
tag-------- full_size--
00035990 7a 9c 00 00 ec bd 7b 5f 13 d9 b6 b6 5d 15 82 78 z...l={_.Y66]..x
cmpr_size-- [data ...
000359a0 06 14 05 3c c6 43 e3 a1 5b 14 c4 33 4a c9 49 51 ...<FCc![.D3JIIQ
000359b0 54 14 5b d1 76 b5 1d 21 2d 59 62 e2 0a a1 5b fb T.[Qv5.!-Ybb.![{
The histograms of the data are fairly flat. One fluctuates much more, however, and it's the one I'm most interested in.
Examining the histograms
Block Usual Span Notes
Block 1 0.3 0.45 peaks to 0.5%
Block 1 0.35 0.45 peaks to 0.3% and 0.5%
Block 3 0.34 0.44 peaks to 0.32% and 0.49%
Block 4 0.05 2% peaks just before 64 128 160 208
dips at 32 48 64 96 128 160
The file I'm looking at is "Kurzweil-SP-Updater.dz" inside this file: https://kurzweil.com/wp-content/uploads/2022/08/SP7G_UpdateE1.06L1.1.2.zip
The question is: What should I try next? Thank you!
The majority of the file (99.9%) consists of four complete deflate streams:
offset 222, length 68616, decompressed 121984 offset 68850, length 90034, decompressed 204288 offset 158896, length 60632, decompressed 98630 offset 219540, length 40045, decompressed 1308768