Search code examples
c++ccompressionlzf

LZF may compress with different algorithms


I am using libLZF for compression in my application. In the documentation, there is a comment that concerns me:

lzf_compress might use different algorithms on different systems and
even different runs, thus might result in different compressed strings
depending on the phase of the moon or similar factors.

I plan to compare compressed data to know if the input was identical. Obviously if different algorithms were used then the compressed data would be different. Is there a solution to this problem? Possibly a way to force a certain algorithm each time? Or is this comment not ever true in practice? After all, phase of the moon, or similar factors is a little strange.


Solution

  • The reason for the "moon phase dependency" is that they omit initialization of some data structures to squeeze out a little bit of performance (only where it does not affect decompression correctness, of course). Not an uncommon trick, as compression libraries go. So if you put your compression code in a separate, one-shot process, and your OS zeroes memory before handing it over to a process (all "big" OSes do but some smaller may not), then you'll always get the same compression result.

    Also, take note of the following, from lzfP.h:

    /*
     * You may choose to pre-set the hash table (might be faster on some
     * modern cpus and large (>>64k) blocks, and also makes compression
     * deterministic/repeatable when the configuration otherwise is the same).
     */
    #ifndef INIT_HTAB
    # define INIT_HTAB 0
    #endif
    

    So I think you only need to #define INIT_HTAB 1 when compiling libLZF to make it deterministic, though wouldn't bet on it too much without further analysis.