I've implemented a simple WordCount-application in hadoop. On my cluster, I have one namenode and 4 datanodes. Replication-rate is set to 4. In the filesystem I have put many lorem-impsum-files. While running the wordcount application I see the reducer working even though the mappers aren't finished yet.
2021-10-29 14:53:31,044 INFO mapreduce.Job: map 70% reduce 23%
How does this work? On many tutorial pages is written (one page for example): "A reducer cannot start while a mapper is still in progress" https://www.talend.com/resources/what-is-mapreduce/
How can the reducers work if the result set of mapping isn't completed?
Once data is emitted by a mapper, it undergoes two steps:
So even though data is still being emitted by the mapper, reducer tasks are being created and are sorting data as it arrives. You're correct in that they won't actually start processing values until all mapping has finished.