Search code examples
hadoophdfsflumeflume-ngbigdata

Flume-ng: source path and type for copying log file from local to HDFS


I am trying to copy some log files from local to HDFS using flume-ng. The source is /home/cloudera/flume/weblogs/ and the sink is hdfs://localhost:8020/flume/dump/. A cron job will copy the logs from tomcat server to /home/cloudera/flume/weblogs/ and I want to log files to be copied to HDFS as the files are available in /home/cloudera/flume/weblogs/ using flume-ng. Below is the conf file I created:

agent1.sources= local
agent1.channels= MemChannel
agent1.sinks=HDFS

agent1.sources.local.type = ???
agent1.sources.local.channels=MemChannel

agent1.sinks.HDFS.channel=MemChannel
agent1.sinks.HDFS.type=hdfs
agent1.sinks.HDFS.hdfs.path=hdfs://localhost:8020/flume/dump/
agent1.sinks.HDFS.hdfs.fileType=DataStream
agent1.sinks.HDFS.hdfs.writeformat=Text
agent1.sinks.HDFS.hdfs.batchSize=1000
agent1.sinks.HDFS.hdfs.rollSize=0
agent1.sinks.HDFS.hdfs.rollCount=10000
agent1.sinks.HDFS.hdfs.rollInterval=600
agent1.channels.MemChannel.type=memory
agent1.channels.MemChannel.capacity=10000
agent1.channels.MemChannel.transactionCapacity=100

I am not able to understand:

1) what will be the value of agent1.sources.local.type = ??? 2) where to mention the source path /home/cloudera/flume/weblogs/ in the above conf file ? 3) Is there anything I am missing in the above conf file?

Please let me know on these.


Solution

  • You can use either :

    An Exec Source and use a command (i.e. cat or tail on gnu/linux on you files)

    Or a Spooling Directory Source for read all files in a directory