Search code examples
hadoopflume

Flume: Not able to fix sink output file size


I am trying to fix the sink output file size. i.e I am trying to get 128 MB each output file. I tried several mechanism ( rollInterval,rollCount,rollSize) but I did not get desired output. I am not getting consistently 128 MB files. I am getting few 128 MB files initially but later on some files are generated with different sizes like 30,40 45 MB etc. And also lot of newly created files opens and remains at .tmp state. Any idea?


Solution

  • I don't think it is possible to always create 128MB size file.If Flume aggregate data of a random size (I mean not constant size) or data of constant size but not a multiple of your requested size, it will always create files of lower size than 128.

    I guess you need to have a constant flow of very small data and then you will always have a tmp file unless this one is filled (is 128MB large). But if you are monitoring directories then the files have to be multiples of 128 instead of what you will have a part file of a lower size.

    Hope I correctly understood your problem.