Search code examples
hadoopcompressionnativehadoop-streaminghadoop-plugins

How can I compile custom hadoop native codec to libhadoop.so?


I have written a native hadoop compression codec. To let it work with hadoop I need to compile native (C code) it to libhadoop.so.

How can I achieve this?


Solution

  • You don't need to compile this into libhadoop.so:

    • Compile your own .so and distribute to your cluster nodes (into the same directory as the current libhadoop.so
    • I assume you've also written your own CompressionCodec (similar to GzipCodec) - add a static block to this code which tries to load the library using System.loadLibrary("mylibrary"); (for a lib named libmylibrary.so).
    • Amend your cluster configuration to include your new compression codec class in the registered list of codecs (amend the io.compression.codecs configuration property):
    • Restart your task trackers

    As a reference, you can follow the implementation and configuration notes for the Google Snappy codec: