Search code examples
hadoopmapreducedistributed-computing

Finding Asymmetric Pairs using MapReduce


I am trying to analyze the social network data which contains follower and followee pairs.

I want to find the asymmetric pairs (A follows B but B doesn't follow A) using MapReduce (Hadoop).

With this pair data, however, I am not sure how the mappers and one reducer should handle the data since separating the pairs will affect the results.

Can someone explain to me how I can use MapReduce to find the asymmetric pairs from the massive data using MapReduce?

Thank you very much.

p.s. I hope there is a way to use many Mappers for this kind of problem even though I might have to use only one Reducer.


Solution

  • Here is how I solved the problem.
    (It works but it may not be the optimal solution. If someone have a better answer please tell me.)

    Each mapper counts the number of follower/followee pairs while making the order of this pair to have small number comes first and the larger number comes second.

    0 -> 1 (ID 0 follows ID 1)
    1 -> 0 (ID 1 follows ID 0)

    the pair (0, 1) gets the counts of 2

    Single reducer collects the key-value pairs of the count and check if the pair has a count of 1.

    1 representing there are only one directed edge between the two nodes.