Search code examples
javahadoopmapreducehadoop-partitioningbigdata

creating custom key value for mappers in hadoop from file


I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain text (5MB).

I read InputFormat (method getSplits), FileInputFormat (FileSplit method) and RecordReader but couldn't understand how to generate and use splits to create required custom (K,V) for my mappers. I am new to Hadoop MapReduce programming so please suggest me how to proceed in this case.


Solution

  • You can set mapreduce.input.fileinputformat.split.maxsize in your configuration in bytes to tell the mapper you should get 5MB of data.