better way of sampling in Hadoop MapReduce

I want 20 % of sample data from the input dataset.

I thought of 2 approaches:

Initially emitting 20 % data from each mapper (single mapper emits 20% of data).Then, the reducer finds 20 % of mapper data after shuffle and sort.(Same procedure applied for both Map and Reduce)
Simply emit each line from mapper and then find 20% of sample data from total data in Reducer.(processing only done is Reducer)

Which is the better approach?

Solution

I would definitely go with your first option. I'm not sure why you need a reducer though. Just filter out 20% in the map phase and call it a day.