Search code examples
pythonpandasgraph

Pairwise reshaping dataframe


I am trying to build a list of graph edges from a two-column data frame representing one edge per node.

pd.DataFrame({'node': ['100',  '100', '200',  '200', '200'],
  'edge': ['111111',  '222222', '123456', '456789',  '987654']})

The result should look like this

pd.DataFrame({'node': ['100', '100','200',  '200', '200', '200', '200', '200'],
            'edge1': ['111111','222222','123456', '123456',  '456789', '456789', '987654', '987654'],
            'edge2': ['222222', '111111','456789', '987654',  '987654', '123456' , '123456','456789']})

I have been wrestling with pivot table and stack for a while but no success.


Solution

  • You can use itertools.permutations to get the permutations of the edges after groupby, then convert the output to a new df to generate the desired output:

    import pandas as pd
    from itertools import permutations
    
    df = pd.DataFrame({'node': ['100',  '100', '200',  '200', '200'],'edge': ['111111',  '222222', '123456', '456789',  '987654']})
    
    df = df.groupby('node')['edge'].apply(list).apply(lambda x:list(permutations(x, 2))).reset_index().explode('edge')
    pd.DataFrame(df["edge"].to_list(), index=df['node'], columns=['edge1', 'edge2']).reset_index()
    

    Result:

    node edge1 edge2
    0 100 111111 222222
    1 100 222222 111111
    2 200 123456 456789
    3 200 123456 987654
    4 200 456789 123456
    5 200 456789 987654
    6 200 987654 123456
    7 200 987654 456789