Search code examples
pythonsocial-networking

Adjacency List Creation from a Blog Post


I need to make an adjacency list to show the relationship between two users from a Topic thread. My dataset consists of two columns: User ID and Topic ID. The Topic ID is like a blog post so many users can post on it. The dataset looks like the below:

User ID Topics ID
1 55
2 55
1 55
6 55

I need to make an adjacency list from this so I just have the Users and their relationship like below:

User User
1 2
1 6
2 6

Any ideas on how to do this within excel or python?


Solution

  • We'll get by with a little help from our friends collections.defaultdict and itertools.combinations:

    from collections import defaultdict
    from itertools import combinations
    
    by_post_id = defaultdict(set)
    
    data = [
        (1, 55),
        (2, 55),
        (1, 55),
        (6, 55),
        (1, 42),
        (11, 42),
        (8, 42),
    ]
    
    # Group up people by post ID
    for user_id, post_id in data:
        by_post_id[post_id].add(user_id)
    
    # (`by_post_id` will look like {55: {1, 2, 6}, 42: {8, 1, 11}})
    
    # Walk over each post...
    for post_id, user_ids in by_post_id.items():
        # ... and generate all pairs of user IDs.
        for combo in combinations(user_ids, 2):
            print(post_id, combo)
    

    This outputs

    55 (1, 2)
    55 (1, 6)
    55 (2, 6)
    42 (8, 1)
    42 (8, 11)
    42 (1, 11)
    

    and naturally, if you don't care about the pairs' post_ids, just ignore it.