Search code examples
flume

loading file into hdfs using flume


***I want to load a text file from my system into hdfs.

this is my conf file:

agent.sources = seqGenSrc
agent.sinks = loggerSink
agent.channels = memoryChannel

agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F my.system.IP/D:/salespeople.txt

agent.sinks.loggerSink.type = hdfs
agent.sinks.loggerSink.hdfs.path = hdfs://IP.address:port:user/flume
agent.sinks.loggerSink.hdfs.filePrefix = events-
agent.sinks.loggerSink.hdfs.round = true
agent.sinks.loggerSink.hdfs.roundValue = 10
agent.sinks.loggerSink.hdfs.roundUnit = minute

agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 1000
agent.channels.memoryChannel.transactionCapacity = 100

agent.sources.seqGenSrc.channels = memoryChannel

agent.sinks.loggerSink.channel = memoryChannel

** when i run it .. i get following .. and then it gets stuck.

13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel memoryChannel
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Waiting for channel: 
memoryChannel to start. Sleeping for 500 ms
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink loggerSink
13/07/23 16:30:44 INFO nodemanager.DefaultLogicalNodeManager: Starting Source seqGenSrc
13/07/23 16:30:44 INFO source.ExecSource: Exec source starting with command:tail -F 10.48.226.27/D:/salespeople.txt

** where am i wrong, or what could be the error ??


Solution

  • I assume you want to write your file to /user/flume, so your path should be :
    agent.sinks.loggerSink.hdfs.path = hdfs://IP.address:port/user/flume

    As your agent uses tail -F there is no message that tells you it is finished (because it never is ^^). if you want to know if your file is created you have to look at /user/flume folder.

    I'm using a configuration like yours and it works perfectly. You could try using
    -Dflume.root.logger=INFO,console to have more information ?