Search code examples
hadoophdfsutility

Utility to push data into HDFS


I need to build a common utility for unix/Windows based system to push data into hadoop system. User can run that utility from any platform and should be able to push data into HDFS.

WebHDFS can be one of the option but curious to know if anything else available.

Any suggestions?


Solution

  • I usually make a maven project and I add this dependency to my pom.xml file:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.6.1</version>
    </dependency>
    

    Then push data into HDFS it's very easy with the hadoop java api, this is a simple example just to see how it works:

    String namenodeLocation = "hdfs://[your-namenode-ip-address]:[hadoop:listening-port]/";
    
    Configuration configuration = new Configuration();
    FileSystem hdfs = FileSystem.get( new URI( namenodeLocation ), configuration );
    Path file = new Path(namenodeLocation+"/myWonderful.data");
    
    FSDataOutputStream outStream = hdfs.create(file);
    
    byte[] coolDataToPushToHDFS = new byte[1500];
    
    outStream.write(coolDataToPushToHDFS);
    outStream.close();
    
    hdfs.close();
    

    It's a really simple program. I think the steps you have to do are:

    1. Let users choose the input/data to push
    2. Use hadoop java api to send file/data to your cluster
    3. Give some feedback to the user.

    You can also append information to a file, not only create new file.

    Give a look to the documentation: https://hadoop.apache.org/docs/current/api/