Search code examples
hadoophiveamazon-emr

How to include jars in Hive (Amazon Hadoop env)


I need to include newer protobuf jar (newer than 2.5.0) in Hive. Somehow no matter where I put the jar - it's being pushed to the end of the classpath. How can I make sure that the jar is in the beginning of the classpath of Hive?


Solution

  • To add your own jar to the Hive classpath so that it's included in the beginning of the classpath and not overloaded by some hadoop jar you need to set the following Env variable -

    export HADOOP_USER_CLASSPATH_FIRST=true

    This indicates that the HADOOP_CLASSPATH will gain priority over general hadoop jars.

    At Amazon emr instances you can add this to /home/hadoop/conf/hadoop-env.sh, and modify the classpath in this file also.

    This is useful when you want to overload jars like protobuf that come with the hadoop general classpath.