Search code examples
hadoopapache-kafkaflume

Files transfer to HDFS


I need to bring the files (zip, csv, xml etc) from windows share location to HDFS. Which is the best approach ? I have kafka - flume - hdfs in mind. Please suggest the efficient way.

I tried getting the files to Kafka consumer.

producer.send( new ProducerRecord(topicName,key,value),

Expect an efficient approach


Solution

  • Kafka is not designed to send files, only individual messages of up to 1MB, by default.

    You can install NFS Gateway in Hadoop, then you should be able to copy directly from the windows share to HDFS without any streaming technology, only a scheduled script on the windows machine, or externally ran

    Or you can mount the windows share on some Hadoop node, and schedule a Cron job if you need continuous file delivery - https://superuser.com/a/1439984/475508

    Other solutions I've seen use tools like Nifi / Streamsets which can be used to read/move files
    https://community.hortonworks.com/articles/26089/windows-share-nifi-hdfs-a-practical-guide.html