I have dataframe shaped like the following:
country | institution | department | individual |
---|---|---|---|
USA | Apple | Marketing | John Fowler |
UK | Apple | Marketing | Peter Pan |
China | Apple | Finance | John Fowler |
Argentina | Bosch | Marketing | Messi |
I would like to create a weighted adjacency matrix that looked like the following:
USA | UK | China | Argentina | Apple | Bosch | Marketing | Finance | John Fowler | Peter Pan | Messi | |
---|---|---|---|---|---|---|---|---|---|---|---|
USA | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
UK | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
China | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
Argentina | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
Apple | 1 | 1 | 1 | 0 | 0 | 0 | 2 | 1 | 2 | 1 | 0 |
Bosch | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
Marketing | 1 | 1 | 0 | 1 | 2 | 1 | 0 | 0 | 1 | 1 | 1 |
Finance | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
John Fowler | 1 | 0 | 1 | 0 | 2 | 0 | 1 | 1 | 0 | 0 | 0 |
Peter Pan | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Messi | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
I have seen examples here and here but I could not extend the solutions to more than 2 columns.
You can use a generator with itertools.permutations
and pandas.crosstab
:
from itertools import permutations
out = (pd.crosstab(*zip(*((a, b) for l in df.to_numpy().tolist()
for a,b in permutations(l, r=2))))
).rename_axis(index=None, columns=None)
Output:
Apple Argentina Bosch China Finance John Fowler Marketing Messi Peter Pan UK USA
Apple 0 0 0 1 1 2 2 0 1 1 1
Argentina 0 0 1 0 0 0 1 1 0 0 0
Bosch 0 1 0 0 0 0 1 1 0 0 0
China 1 0 0 0 1 1 0 0 0 0 0
Finance 1 0 0 1 0 1 0 0 0 0 0
John Fowler 2 0 0 1 1 0 1 0 0 0 1
Marketing 2 1 1 0 0 1 0 1 1 1 1
Messi 0 1 1 0 0 0 1 0 0 0 0
Peter Pan 1 0 0 0 0 0 1 0 0 1 0
UK 1 0 0 0 0 0 1 0 1 0 0
USA 1 0 0 0 0 1 1 0 0 0 0
How it works:
[(a,b), (b,a), ...]
[(a,b,...),(b,a,...)]
crosstab
expanding as the first two parameters