I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain text (5MB).
I read InputFormat (method getSplits)
, FileInputFormat (FileSplit method)
and RecordReader
but couldn't understand how to generate and use splits to create required custom (K,V) for my mappers. I am new to Hadoop MapReduce
programming so please suggest me how to proceed in this case.
You can set mapreduce.input.fileinputformat.split.maxsize
in your configuration in bytes to tell the mapper you should get 5MB of data.