Search code examples
pythonpandasfiltermaskany

Select columns that present a specific condition in pandas


I have a dataframe relating species and sites. Each cell contains a abundance value. I want to select only those species that present an abundance higher than 5 in any site.

Input:

#df
        Species_1 Species_2 Species_3
    Site_1 0 0 1
    Site_2 0 0 0
    Site_3 6 0 7

Desirable output:

    Species_1 Species3
Site_1 0 1
Site_2 0 0
Site_3 6 7

Imagine that I have a dataframe 150 x 150, and I want to select objetively each column withou writing their names. I managed to write a mask for the dataframe (eg. mask = df > 5), but I do not know how to select only those columns that present at least one "True" value.


Solution

  • Try this

    df = pd.DataFrame({'Species_1': {'Site_1': 0, 'Site_2': 0, 'Site_3': 6}, 'Species_2': {'Site_1': 0, 'Site_2': 0, 'Site_3': 0}, 'Species_3': {'Site_1': 1, 'Site_2': 0, 'Site_3': 7}, 'Species_4': {'Site_1': 2, 'Site_2': 2, 'Site_3': 2}})
    #        Species_1  Species_2  Species_3  Species_4
    #Site_1          0          0          1          2
    #Site_2          0          0          0          2
    #Site_3          6          0          7          2
    
    df.loc[:, (df > 5).any()]
    #        Species_1  Species_3
    #Site_1          0          1
    #Site_2          0          0
    #Site_3          6          7