Search code examples
pythonpandasdiscrete-mathematicsdata-transform

Pandas: from adjacency matrix to series of node lists


I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame format to a specific pd.Series format.

For non discrete-math people, this looks like the following transformation:

From

df = pd.DataFrame(columns=['item1','item2','item3'],
                  index=['foo','bar','qux'], 
                  data = [[1,1,0],[0,1,1],[0,0,0]])

which looks like

    item1   item2   item3
foo     1       1       0
bar     0       1       1
qux     0       0       0

To

srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])

that looks like

foo    [item1, item2]
bar    [item2, item3]
qux                []
dtype: object

I have partially achieved this goal with the following code:

df_1 = df.stack().reset_index()

srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)

which, together with being slightly unreadable, has the issue of having dropped poor qux along the way.

Is there any shorter path to the desired result?


Solution

  • If want avoid reshape by stack and groupby here is possible use list comprehension with convert 0,1 to boolean by DataFrame.astype and then filter columns names, last pass it to Series constructor:

    print([list(df.columns[x]) for x in df.astype(bool).to_numpy()])
    [['item1', 'item2'], ['item2', 'item3'], []]
    
    s = pd.Series([list(df.columns[x]) for x in df.astype(bool).to_numpy()], index=df.index)
    print(s)
    foo    [item1, item2]
    bar    [item2, item3]
    qux                []
    dtype: object
    

    If also performance is important use:

    c = df.columns.to_numpy()
    s = pd.Series([list(c[x]) for x in df.astype(bool).to_numpy()], index=df.index)