Search code examples
hadoopclouderaflumehortonworks-data-platformflume-ng

Creating file in HDFS but not appending any content


I am using a HTTP-Source to put JSON files into HDFS (Single node SANDBOX).

The file is created in the correct directory however nothing is appended to the file. Could you verify my flume.conf before I start debugging the HTTP-Source?

#################################################################
# Name the components on this agent
#################################################################

hdfs-agent.sources = httpsource
hdfs-agent.sinks = hdfssink
hdfs-agent.channels = channel1

#################################################################
# Describe source
#################################################################

# Source node
hdfs-agent.sources.httpsource.type = http 
hdfs-agent.sources.httpsource.port = 5140
hdfs-agent.sources.httpsource.handler = org.apache.flume.source.http.JSONHandler

#################################################################
# Describe Sink
#################################################################

# Sink hdfs
hdfs-agent.sinks.hdfssink.type = hdfs
hdfs-agent.sinks.hdfssink.hdfs.path = hdfs://sandbox:8020/user/flume/node
hdfs-agent.sinks.hdfssink.hdfs.fileType = DataStream
hdfs-agent.sinks.hdfssink.hdfs.batchSize = 1
hdfs-agent.sinks.hdfssink.hdfs.rollSize = 0
hdfs-agent.sinks.hdfssink.hdfs.rollCount = 0

#################################################################
# Describe channel
#################################################################

# Channel memory
hdfs-agent.channels.channel1.type = memory
hdfs-agent.channels.channel1.capacity = 1000
hdfs-agent.channels.channel1.transactionCapacity = 100


#################################################################
# Bind the source and sink to the channel
#################################################################

hdfs-agent.sources.httpsource.channels = channel1
hdfs-agent.sinks.hdfssink.channel = channel1

I am currently just trying to test it by starting small:

[{"text": "Hi Flume this Node"}]

So I am thinking my batchSize/rollSize/rollCount could be the issue here?


Solution

  • batchSize,rollSize, rollCount values are fine. Setting rollSize and rollCount to 0 will disable the file rolling feature.

    hdfs-agent.sources.httpsource.type should be set to org.apache.flume.source.http.HTTPSource

    The format of data sent to http source should be

    [{"headers" : {"a":"b", "c":"d"},"body": "random_body"}, {"headers" : {"e": "f"},"body": "random_body2"}].

    I tested sending using the data you used ([{"text": "Hi Flume this Node"}]). Nothing was getting appended to my file as there is not "body" attribute. But when I posted the following, data got appended to my file.

     curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{  "headers" : {           "timestamp" : "434324343", "host" :"random_host.example.com", "field1" : "val1"            },  "body" : "random_body"  }]' http://localhost:5140.
    

    Hope this helps