Search code examples
pythonpandasgraphnetworkxigraph

create adjacency matrix from unique values from multiple columns


I have dataframe shaped like the following:

country institution department individual
USA Apple Marketing John Fowler
UK Apple Marketing Peter Pan
China Apple Finance John Fowler
Argentina Bosch Marketing Messi

I would like to create a weighted adjacency matrix that looked like the following:

USA UK China Argentina Apple Bosch Marketing Finance John Fowler Peter Pan Messi
USA 0 0 0 0 1 0 1 0 1 0 0
UK 0 0 0 0 1 0 1 0 0 1 0
China 0 0 0 0 1 0 0 1 1 0 0
Argentina 0 0 0 0 0 1 1 0 0 0 1
Apple 1 1 1 0 0 0 2 1 2 1 0
Bosch 0 0 0 1 0 0 1 0 0 0 1
Marketing 1 1 0 1 2 1 0 0 1 1 1
Finance 0 0 1 0 1 0 0 0 1 0 0
John Fowler 1 0 1 0 2 0 1 1 0 0 0
Peter Pan 0 1 0 0 1 0 1 0 0 0 0
Messi 0 0 0 1 0 1 1 0 0 0 0

I have seen examples here and here but I could not extend the solutions to more than 2 columns.


Solution

  • You can use a generator with itertools.permutations and pandas.crosstab:

    from itertools import permutations
    
    out = (pd.crosstab(*zip(*((a, b) for l in df.to_numpy().tolist()
                              for a,b in permutations(l, r=2))))
          ).rename_axis(index=None, columns=None)
    

    Output:

                 Apple  Argentina  Bosch  China  Finance  John Fowler  Marketing  Messi  Peter Pan  UK  USA
    Apple            0          0      0      1        1            2          2      0          1   1    1
    Argentina        0          0      1      0        0            0          1      1          0   0    0
    Bosch            0          1      0      0        0            0          1      1          0   0    0
    China            1          0      0      0        1            1          0      0          0   0    0
    Finance          1          0      0      1        0            1          0      0          0   0    0
    John Fowler      2          0      0      1        1            0          1      0          0   0    1
    Marketing        2          1      1      0        0            1          0      1          1   1    1
    Messi            0          1      1      0        0            0          1      0          0   0    0
    Peter Pan        1          0      0      0        0            0          1      0          0   1    0
    UK               1          0      0      0        0            0          1      0          1   0    0
    USA              1          0      0      0        0            1          1      0          0   0    0
    

    How it works:

    • for each row, generate all combinations of 2 items [(a,b), (b,a), ...]
    • reshape into 2 arrays [(a,b,...),(b,a,...)]
    • pass the arrays to crosstab expanding as the first two parameters
    corresponding graph:

    enter image description here