Search code examples
pythonpandasloopsrowsparse-matrix

Python Pandas: get columns by values in row


For the purposes of my clustering algorythm I need to iterate over a word/document matrix row by row, and for every row get the submatrix of all columns where this row has a value of 1, (better even, with the exclusion of the row iterated). Say I have a df:

df = pd.DataFrame({'A': '0 1 0 1 0 1 0 1'.split(),
                 'B': '1 1 0 1 0 0 1 0'.split(),
                 'C': '0 0 0 1 0 0 1 0 '.split(),
                 'D': '0 0 1 0 0 0 0 0'.split()})

   w1 w2 w3 w4
0  0  1  0  0
1  1  1  0  0
2  0  0  0  1
3  1  1  1  0

I need the code to return for the first row

   w2
1  1
2  0
3  1

For the second

   w1 w2
0  1  0
2  0  0
3  1  1

and so on. How do I do that? Can't wrap my mind around it using .iloc


Solution

  • IIUC, I print all those steps in case you need them to understand the process

    l=np.where(df.eq(1), df.columns, 'nan')
    df_list=[]
    
    for y,x in enumerate(l) :
        print(x)
        print(y)
        print(x[x!='nan'])
        print(df.drop(y)[x[x!='nan']])
        df_list.append(df.drop(y)[x[x!='nan']]) #you can store those df in a list 
    
    
    ['nan' 'w2' 'nan' 'nan']
    0
    ['w2']
       w2
    1   1
    2   0
    3   1
    ['w1' 'w2' 'nan' 'nan']
    1
    ['w1' 'w2']
       w1  w2
    0   0   1
    2   0   0
    3   1   1
    ['nan' 'nan' 'nan' 'w4']
    2
    ['w4']
       w4
    0   0
    1   0
    3   0
    ['w1' 'w2' 'w3' 'nan']
    3
    ['w1' 'w2' 'w3']
       w1  w2  w3
    0   0   1   0
    1   1   1   0
    2   0   0   0