For the purposes of my clustering algorythm I need to iterate over a word/document matrix row by row, and for every row get the submatrix of all columns where this row has a value of 1, (better even, with the exclusion of the row iterated). Say I have a df:
df = pd.DataFrame({'A': '0 1 0 1 0 1 0 1'.split(),
'B': '1 1 0 1 0 0 1 0'.split(),
'C': '0 0 0 1 0 0 1 0 '.split(),
'D': '0 0 1 0 0 0 0 0'.split()})
w1 w2 w3 w4
0 0 1 0 0
1 1 1 0 0
2 0 0 0 1
3 1 1 1 0
I need the code to return for the first row
w2
1 1
2 0
3 1
For the second
w1 w2
0 1 0
2 0 0
3 1 1
and so on.
How do I do that? Can't wrap my mind around it using .iloc
IIUC, I print all those steps in case you need them to understand the process
l=np.where(df.eq(1), df.columns, 'nan')
df_list=[]
for y,x in enumerate(l) :
print(x)
print(y)
print(x[x!='nan'])
print(df.drop(y)[x[x!='nan']])
df_list.append(df.drop(y)[x[x!='nan']]) #you can store those df in a list
['nan' 'w2' 'nan' 'nan']
0
['w2']
w2
1 1
2 0
3 1
['w1' 'w2' 'nan' 'nan']
1
['w1' 'w2']
w1 w2
0 0 1
2 0 0
3 1 1
['nan' 'nan' 'nan' 'w4']
2
['w4']
w4
0 0
1 0
3 0
['w1' 'w2' 'w3' 'nan']
3
['w1' 'w2' 'w3']
w1 w2 w3
0 0 1 0
1 1 1 0
2 0 0 0