Search code examples
javacompressionsnappylz4

How to correctly implement LZ4, Snappy or equivalent compression techniques in Java?


I've tried implementing Java version of LZ4 into a search engine kind of program trying to search data from large text files. I simply compressed the outputstream and stored it into txt files or files without names. However, I realized the supposedly compressed files did not reduce in size, but it's even larger in size than original files.

At last I had to resort to zip4j since it works for me.

I wonder how may I approach using jars of LZ4 or Snappy to compress/decompress correctly?

In addition, how may I use such algorithms to compress a single folder with many files inside?

Thanks!


Solution

  • I faced a similar problem. I was trying to send a large file (~ 709 MB) over local network in chunks of 8192 bytes. I used Lz4 compression/decompression to reduce the network bandwidth.

    So assuming you are trying to do something similar, here's my suggestion :

    Here's the snippet of similar regular example you'll find on https://github.com/jpountz/lz4-java

    private static int decompressedLength;
    private static LZ4Factory factory = LZ4Factory.fastestInstance();
    private static LZ4Compressor compressor = factory.fastCompressor();
    
    public static byte[] compress(byte[] src, int srcLen) {
        decompressedLength = srcLen;
        int maxCompressedLength = compressor.maxCompressedLength(decompressedLength);
        byte[] compressed = new byte[maxCompressedLength];
        compressor.compress(src, 0, decompressedLength, compressed, 0, maxCompressedLength);
        return compressed;
    }
    

    Now if you return the compressed byte array as it is then there are fair chances that it may have length greater than the original uncompressed data.

    So you can modify it as follows :

    private static int decompressedLength;
    private static LZ4Factory factory = LZ4Factory.fastestInstance();
    private static LZ4Compressor compressor = factory.fastCompressor();
    
    public static byte[] compress(byte[] src, int srcLen) {
        decompressedLength = srcLen;
        int maxCompressedLength = compressor.maxCompressedLength(decompressedLength);
        byte[] compressed = new byte[maxCompressedLength];
        int compressLen = compressor.compress(src, 0, decompressedLength, compressed, 0, maxCompressedLength);
        byte[] finalCompressedArray = Arrays.copyOf(compressed, compressLen);
        return finalCompressedArray;
    }
    

    compressLen stores the actual compressed length and the finalCompressedArray byte array (of length compressLen) stores the actual compressed data. It's length, in general, is less than both the lengths of compressed byte array and original uncompressed byte array

    Now you can decompress the finalCompressedArray byte array in regular fashion as below :

    private static LZ4FastDecompressor decompressor = factory.fastDecompressor();
    
    public static byte[] decompress(byte[] finalCompressedArray, int decompressedLength) {
        byte[] restored = new byte[decompressedLength];
        restored = decompressor.decompress(finalCompressedArray, decompressedLength);
        return restored;
    }