Search code examples
pythonpandasnetworkx

Restructure pandas dataframe for Network Graph


I am working on building a pandas data frame to make a network graph. My data frame currently looks like this:

Group1 Group2 Group3 Asset
A      A      A      MI
A      A      A      JI
A      A      A      MI
A      A      A      JI
A      A      A      MI
A      A      B      MI
A      A      C      MI
A      A      C      PA
A      A      C      MI
A      A      C      PA
A      A      C      MI
A      A      C      PA
A      A      C      MI
A      A      C      PA
A      A      C      MI

and I want to manipulate it to look like this:

Source Target Weight
MI     JI     1
MI     MI     1
MI     PA     1
JI     MI     1
PA     MI     1

Essentially I would like to break up the Asset column into 2 columns of each possible combination for every unique pairing of Group 1, 2, and 3. The first part of the pairing would make up the Source column and the 2nd part of the pairing would make up the Target column. And for this minimal example, it's just group 3 that varies. The weight column represents the sum of total pairings of individual assets.

Can someone help point me in the right direction? Any help would be appreciated!

Here is some sample code to generate the df:

df = pd.DataFrame({'Group1': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
                   'Group2': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
                   'Group3': ['A','A','A','A','A','B','C','C','C','C','C','C','C','C','C'],
                   'Asset': ['MI','JI','MI','JI','MI','MI','MI','PA','MI','PA','MI','PA','MI','PA','MI']
                   
                   })

Solution

  • import itertools as it
    import pandas as pd
    
    df = pd.DataFrame({'Group1': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
                       'Group2': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
                       'Group3': ['A','A','A','A','A','B','C','C','C','C','C','C','C','C','C'],
                       'Asset': ['MI','JI','MI','JI','MI','MI','MI','PA','MI','PA','MI','PA','MI','PA','MI']
                       })
    
    def assets_pairs(assets_group):
        unique_assets = set(assets_group)
        if len(unique_assets) == 1:
            x = assets_group.iat[0]  # get the only unique asset
            pairs = [[x, x]]
        else:
            pairs = it.permutations(unique_assets, r=2)  # get all the unique pairs without repeated elements
        return pd.DataFrame(pairs, columns=['Source', 'Target']) 
       
    df_pairs = (
        df.groupby(['Group1', 'Group2', 'Group3'])['Asset']
          .apply(assets_pairs)   # create asset pairs per group 
          .groupby(['Source', 'Target'], as_index=False)  # compute the weights  by 
          .agg(Weights = ('Source', 'size'))              # counting the unique ('Source', 'Target') pairs
    )
    
    >>> df_pairs
    
      Source Target  Weights
    0     JI     MI        1
    1     MI     JI        1
    2     MI     MI        1
    3     MI     PA        1
    4     PA     MI        1