Search code examples
hadoophdfsbigdataflumeflume-ng

Increasing the file size in flume with memory channel


Below is my flume config file. Even after the changing the rollInterval and rollSize only 10 events is getting written also the console shows rollCount=10 and events=10. Also I tried increasing the rollCount to 1000 but no change in output. Can anyone suggest to increase the file size being written in hdfs. Whats wrong with the below conf file?

#naming components 

NetAgent.sources = NetCat_1 NetCat_2
NetAgent.sinks = HDFS
NetAgent.channels = MemChannel


NetAgent.sources.NetCat_1.type = netcat
NetAgent.sources.NetCat_1.bind = localhost
NetAgent.sources.NetCat_1.port = 8671

NetAgent.sources.NetCat_2.type = netcat
NetAgent.sources.NetCat_2.bind = localhost
NetAgent.sources.NetCat_2.port = 8672


NetAgent.sinks.HDFS.type = hdfs
NetAgent.sinks.HDFS.hdfs.path = file path here
NetAgent.sinks.HDFS.hdfs.filePrefix = test
NetAgent.sinks.HDFS.hdfs.rollSize = 67108864
NetAgent.sinks.HDFS.hdfs.rollInterval = 3600
NetAgent.sinks.HDFS.rollCount = 0
NetAgent.sinks.HDFS.hdfs.batchSize = 10000
NetAgent.sinks.HDFS.hdfs.writeFormat = Text
NetAgent.sinks.HDFS.hdfs.fileType = DataStream


NetAgent.channels.MemChannel.type = memory
NetAgent.channels.MemChannel.capacity = 20000
NetAgent.channels.MemChannel.transactionCapacity = 20000


NetAgent.sources.NetCat_1.channels = MemChannel
NetAgent.sources.NetCat_2.channels = MemChannel
NetAgent.sinks.HDFS.channel = MemChannel

The console logs as

(SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUg-org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java)]
rolling: rollCount: 10, events: 10

the image shows the files written in HDFS


Solution

  • You forgot to add hdfs to your rollCount configuration. It is using the default value of 10 because it doesn't see your configuration. Notice that your config for HDFS is:

    NetAgent.sinks.HDFS.type = hdfs
    NetAgent.sinks.HDFS.hdfs.rollSize = 67108864
    NetAgent.sinks.HDFS.hdfs.rollInterval = 3600
    NetAgent.sinks.HDFS.rollCount = 0
    NetAgent.sinks.HDFS.hdfs.batchSize = 10000
    NetAgent.sinks.HDFS.hdfs.writeFormat = Text
    NetAgent.sinks.HDFS.hdfs.fileType = DataStream
    

    In the rollCount line, it needs to be:

    NetAgent.sinks.HDFS.hdfs.rollCount = 0
    

    This will override the default rollCount and your Flume agent will behave how you want it to.