I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Pattern. My question is:
What I see is next function in CartesianRecordReader class read both split without sending the data.
Here is the source code https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java
That's all, thanks in advance :)
When does record reader send data to mapper?
Let me answer by giving you an idea how how the mapper and the RecordReader are related. This is the Hadoop code that sends data to the mapper. 1
RecordReader<K1, V1> input;
K1 key = input.createKey();
V1 value = input.createValue();
while (input.next(key, value)) {
// map pair to output
mapper.map(key, value, output, reporter);
if(incrProcCount) {
reporter.incrCounter(SkipBadRecords.COUNTER_GROUP,
SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS, 1);
}
}
Basically, the Hadoop will call next
until it returns false
, and at every call key
and value
will obtain new values. Key
being normally the bytes read so far and value
the next line in the file.
Where is the code that send the data to mapper?
That code is at the source code of hadoop (Probably at the MapContextImpl class) but it resembles what I have wrote in the code snippet.
EDIT : The source code is at MapRunner.