Search code examples
javahadoopconfigurationhdfscloudera-cdh

Can't access HDFS via Java API (Cloudera-CDH4.4.0)


I'm trying to access my HDFS using Java code but I can't get it working... after 2 days of struggling I think it's time to ask for help.

This is my code:

Configuration conf = new Configuration();           
conf.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
conf.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
FileSystem hdfs = FileSystem.get(conf);

boolean success = hdfs.mkdirs(new Path("/user/cloudera/testdirectory"));
System.out.println(success);
        

I got this code from here and here. Unfortunately the hdfs object is just a "LocalFileSystem"-object, so something must be wrong. Looks like this is exactly what Rejeev wrote on his website:

[...] If you do not assign the configurations to conf object (using hadoop xml file) your HDFS operation will be performed on the local file system and not on the HDFS. [...]

With absolute paths I get the same result.

conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"))

This is the libary I'm currently using:

hadoop-core-2.0.0-mr1-cdh4.4.0.jar

I heard that hadoop-core was split into multiple libs so I also tried the following libs:

hadoop-common-2.0.0-alpha.jar

hadoop-mapreduce-client-core-2.0.2-alpha.jar

I'm using Cloudera-CDH4.4.0 so hadoop is already installed. Via console everything is working fine. For example:

hadoop fs -mkdir testdirectory

So everything should be set up correctly as per default.

I hope that you guys can help me... this stuff is driving me nuts! It's extremely frustrating to fail with such a simple task.

Many thanks in advance for any help.


Solution

  • 1) You don't need to conf.addResource unless you are overriding any configuration variables.

    2) Hope you are creating a Jar file and running the jar file in command window and not in eclipse. If you execute in eclipse, it will execute on local file system.

    3) I ran below code and it worked.

    public class Hmkdirs {
    public static void main(String[] args) 
            throws IOException 
            { 
    Configuration conf = new Configuration();  
    FileSystem fs = FileSystem.get(conf); 
    boolean success = fs.mkdirs(new Path("/user/cloudera/testdirectory1"));
    System.out.println(success);
            }
    

    }

    4) To execute, you need to create a jar file, you can do that either from eclipse or command prompt and execute the jar file.

    command prompt jar file sample:

    javac -classpath /usr/local/hadoop/hadoop-core-1.2.1.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar -d classes WordCount.java && jar -cvf WordCount.jar -C classes/ .

    jar file execution via hadoop at command prompt.

    hadoop jar hadoopfile.jar hadoop.sample.fileaccess.Hmkdirs

    hadoop.sample.fileaccess is the package in which my class Hmkdirs exist. If your class exist in default package, you don't have to specify it, just the class is fine.


    Update: You can execute from eclipse and still access hdfs, check below code.

    public class HmkdirsFromEclipse {
    
    public static void main(String[] args) 
    
            throws IOException 
            { 
    Configuration conf = new Configuration();  
    conf.addResource("/etc/hadoop/conf/core-site.xml");
    conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
    conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020/");
    conf.set("hadoop.job.ugi", "cloudera");
    conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    FileSystem fs = FileSystem.get(conf); 
    boolean success = fs.mkdirs(new Path("/user/cloudera/testdirectory9"));
    System.out.println(success);
            }
    

    }