Search code examples
javahadoopsnappy

Unable to use SnappyCodec with hadoop jar: NullPointerException


I'm trying to use Hadoop's compression libraries in a simplistic java program. However I'm unable to use the Snappy codec: execution yields a NullPointerException in method SnappyCodec.createCompressor.

Note that I'm not getting the typical java.lang.UnsatisfiedLinkError that would result from not setting LD_LIBRARY_PATH and JAVA_LIBRARY_PATH environment variables. Snappy is properly installed with CDH, reported as available by running hadoop checknative and Snappy decompression works when I do a hdfs dfs -text on a snappy file.

$ hadoop jar SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
        at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
$
$ hadoop checknative | grep snappy 2>/dev/null
$ snappy:  true /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/libsnappy.so.1
$ ls /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
libhadoop.a       libhadoop.so.1.0.0  libnativetask.a         libsnappy.so
libhadooppipes.a  libhadooputils.a    libnativetask.so        libsnappy.so.1
libhadoop.so      libhdfs.a           libnativetask.so.1.0.0
libsnappy.so.1.1.4
$ export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
$ java -Djava.library.path=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/ -cp `hadoop classpath`:SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
        at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
        at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)

The Java code looks like this, the last line being the culprit:

        SnappyCodec.checkNativeCodeLoaded();
        CompressionCodec codec = new SnappyCodec();
        Compressor comp = CodecPool.getCompressor(codec);

What did I miss?


Solution

  • Okay so, the problem turned out to be that CompressionCodec requires proper configuration, as pointed out in this answer.

    The easy way to get a configured Snappy compressor is like so:

    Configuration conf = new Configuration();
    CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
    CompressionCodec codec = ccf.getCodecByClassName(SnappyCodec.class.getName());
    Compressor comp = codec.createCompressor();
    

    The resulting jar can be run with the command lines used in the original question.