Search code examples
hadoopamazon-s3hadoop2

S3N and S3A distcp not working in Hadoop 2.6.0


Summary

Stock hadoop2.6.0 install gives me no filesystem for scheme: s3n. Adding hadoop-aws.jar to the classpath now gives me ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem.

Details

I've got a mostly stock install of hadoop-2.6.0. I've only set directories, and set the following environment variables:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_HOME=$HADOOP_COMMON_HOME
export HADOOP_HDFS_HOME=$HADOOP_COMMON_HOME
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
export HADOOP_OPTS=-XX:-PrintWarnings
export PATH=$PATH:$HADOOP_COMMON_HOME/bin

The hadoop classpath is:

/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/hadoop/share/hadoop/tools/lib/*

When I try to run hadoop distcp -update hdfs:///files/to/backup s3n://${S3KEY}:${S3SECRET}@bucket/files/to/backup I get Error: java.io.Exception, no filesystem for scheme: s3n. If I use s3a, I get the same error complaining about s3a.

The internet told me that hadoop-aws.jar is not part of the classpath by default. I added the following line to /opt/hadoop/etc/hadoop/hadoop-env.sh:

HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_COMMON_HOME/share/hadoop/tools/lib/*

and now hadoop classpath has the following appended to it:

:/opt/hadoop/share/hadoop/tools/lib/*

which should cover /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar. Now I get:

Caused by: java.lang.ClassNotFoundException:
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

The jar file contains the class that can't be found:

unzip -l /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar |grep S3AFileSystem
28349  2014-11-13 21:20   org/apache/hadoop/fs/s3a/S3AFileSystem.class

Is there an order to adding these jars, or am I missing something else critical?


Solution

  • You can resolve s3n issue by adding following lines to core-site.xml

    <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>The FileSystem for s3n: (Native S3) uris.</description>
    </property>
    

    It should work after adding that property.

    Edit: If it doesn't resolve your problem then you will have to add the jars in classpath. Can you check if mapred-site.xml has mapreduce.application.classpath: /usr/hdp//hadoop-mapreduce/*. It will include other related jars in classpath :)