Search code examples
pythonpandasdata-analysis

Pandas return index and column name based on the item value


I am trying to return a column name and index based on the item value. I have something like this:

enter image description here

So let's day I am trying to return index and column names of all values where value is > 0.75.

for date, row in df.iterrows():
    for item in row:
        if item > .75:
            print index, row

I wanted this to return "traffic and robbery". However this returns all the values. I did not find answer to this in documentation, online or here. Thank you in advance.


Solution

  • Using slightly different numbers (for no particular reason), you can stack to for a Series and then use boolean indexing:

    In [11]: df.stack()
    Out[11]:
    assault  assault    1.00
             robbery    0.76
             traffic    0.60
    robbery  assault    0.76
             robbery    1.00
             traffic    0.78
    traffic  assault    0.68
             robbery    0.78
             traffic    1.00
    dtype: float64
    
    In [12]: s = df.stack()
    
    In [13]: s[(s!=1) & (s>0.77)]
    Out[13]:
    robbery  traffic    0.78
    traffic  robbery    0.78
    dtype: float64
    

    You can do a bit of numpy to remove the duplicates, one way* is to 0 those not in the upper diagonal with triu (unfortunately this doesn't return as a DataFrame :( ):

    In [21]: np.triu(df, 1)
    Out[21]:
    array([[ 0.  ,  0.76,  0.6 ],
           [ 0.  ,  0.  ,  0.78],
           [ 0.  ,  0.  ,  0.  ]])
    
    In [22]: s = pd.DataFrame(np.triu(df, 1), df.index, df.columns).stack() > 0.77
    
    In [23]: s[s]
    Out[23]:
    robbery  traffic    True
    dtype: bool
    
    In [24]: s[s].index.tolist()
    Out[24]: [('robbery', 'traffic')]
    

    *I suspect there are more efficient ways...