I am trying to analyze the social network data which contains follower
and followee
pairs.
I want to find the asymmetric pairs
(A follows B but B doesn't follow A) using MapReduce (Hadoop).
With this pair data, however, I am not sure how the mappers
and one reducer
should handle the data since separating the pairs will affect the results.
Can someone explain to me how I can use MapReduce to find the asymmetric pairs from the massive data using MapReduce?
Thank you very much.
p.s. I hope there is a way to use many Mappers for this kind of problem even though I might have to use only one Reducer.
Here is how I solved the problem.
(It works but it may not be the optimal solution. If someone have a better answer please tell me.)
Each mapper
counts the number of follower/followee pairs while making the order of this pair to have small number comes first and the larger number comes second.
0 -> 1 (ID 0 follows ID 1)
1 -> 0 (ID 1 follows ID 0)the pair (0, 1) gets the counts of 2
Single reducer
collects the key-value pairs of the count and check if the pair has a count of 1.
1 representing there are only one directed edge between the two nodes.