I need to make an adjacency list to show the relationship between two users from a Topic thread. My dataset consists of two columns: User ID and Topic ID. The Topic ID is like a blog post so many users can post on it. The dataset looks like the below:
User ID | Topics ID |
---|---|
1 | 55 |
2 | 55 |
1 | 55 |
6 | 55 |
I need to make an adjacency list from this so I just have the Users and their relationship like below:
User | User |
---|---|
1 | 2 |
1 | 6 |
2 | 6 |
Any ideas on how to do this within excel or python?
We'll get by with a little help from our friends collections.defaultdict
and itertools.combinations
:
from collections import defaultdict
from itertools import combinations
by_post_id = defaultdict(set)
data = [
(1, 55),
(2, 55),
(1, 55),
(6, 55),
(1, 42),
(11, 42),
(8, 42),
]
# Group up people by post ID
for user_id, post_id in data:
by_post_id[post_id].add(user_id)
# (`by_post_id` will look like {55: {1, 2, 6}, 42: {8, 1, 11}})
# Walk over each post...
for post_id, user_ids in by_post_id.items():
# ... and generate all pairs of user IDs.
for combo in combinations(user_ids, 2):
print(post_id, combo)
This outputs
55 (1, 2)
55 (1, 6)
55 (2, 6)
42 (8, 1)
42 (8, 11)
42 (1, 11)
and naturally, if you don't care about the pairs' post_id
s, just ignore it.