I have which I think is a pretty general problem. Namely, to recast a bipartite adjacency matrix in a list of a list of nodes. In Pandas, that would mean transform from a specific pd.DataFrame
format to a specific pd.Series
format.
For non discrete-math people, this looks like the following transformation:
From
df = pd.DataFrame(columns=['item1','item2','item3'],
index=['foo','bar','qux'],
data = [[1,1,0],[0,1,1],[0,0,0]])
which looks like
item1 item2 item3
foo 1 1 0
bar 0 1 1
qux 0 0 0
To
srs = pd.Series([['item1','item2'],['item2','item3'],[]],index=['foo','bar','qux'])
that looks like
foo [item1, item2]
bar [item2, item3]
qux []
dtype: object
I have partially achieved this goal with the following code:
df_1 = df.stack().reset_index()
srs = df_1.loc[df_1[0]==1].groupby('level_0')['level_1'].apply(list)
which, together with being slightly unreadable, has the issue of having dropped poor qux
along the way.
Is there any shorter path to the desired result?
If want avoid reshape by stack
and groupby
here is possible use list comprehension with convert 0,1
to boolean by DataFrame.astype
and then filter columns names, last pass it to Series
constructor:
print([list(df.columns[x]) for x in df.astype(bool).to_numpy()])
[['item1', 'item2'], ['item2', 'item3'], []]
s = pd.Series([list(df.columns[x]) for x in df.astype(bool).to_numpy()], index=df.index)
print(s)
foo [item1, item2]
bar [item2, item3]
qux []
dtype: object
If also performance is important use:
c = df.columns.to_numpy()
s = pd.Series([list(c[x]) for x in df.astype(bool).to_numpy()], index=df.index)