I am trying to copy some log files from local
to HDFS
using flume-ng
. The source
is /home/cloudera/flume/weblogs/
and the sink
is hdfs://localhost:8020/flume/dump/
. A cron job will copy the logs from tomcat server to /home/cloudera/flume/weblogs/
and I want to log files to be copied to HDFS
as the files are available in /home/cloudera/flume/weblogs/
using flume-ng
. Below is the conf file I created:
agent1.sources= local
agent1.channels= MemChannel
agent1.sinks=HDFS
agent1.sources.local.type = ???
agent1.sources.local.channels=MemChannel
agent1.sinks.HDFS.channel=MemChannel
agent1.sinks.HDFS.type=hdfs
agent1.sinks.HDFS.hdfs.path=hdfs://localhost:8020/flume/dump/
agent1.sinks.HDFS.hdfs.fileType=DataStream
agent1.sinks.HDFS.hdfs.writeformat=Text
agent1.sinks.HDFS.hdfs.batchSize=1000
agent1.sinks.HDFS.hdfs.rollSize=0
agent1.sinks.HDFS.hdfs.rollCount=10000
agent1.sinks.HDFS.hdfs.rollInterval=600
agent1.channels.MemChannel.type=memory
agent1.channels.MemChannel.capacity=10000
agent1.channels.MemChannel.transactionCapacity=100
I am not able to understand:
1) what will be the value of agent1.sources.local.type = ???
2) where to mention the source
path /home/cloudera/flume/weblogs/
in the above conf file ?
3) Is there anything I am missing in the above conf file?
Please let me know on these.
You can use either :
An Exec Source and use a command (i.e. cat or tail on gnu/linux on you files)
Or a Spooling Directory Source for read all files in a directory