Search code examples
pythonpython-3.xpandasisin

Filter dataframe on second row using list


I have a dataframe that needs to be kept in the structure below.


df
  One       two   three          
  date      apple  banana
2019-12-20   0      4
2020-01-03   10     5

Is there a way to keep only columns where the second row is contained within the list? I'm only familiar with doing this for columns

fruits= ["date", "apple"]
df = df.loc[:, df.columns.isin(fruits)

Expected Output

   One       two          
  date      apple  
2019-12-20   0      
2020-01-03   10    

Solution

  • use .isin and use the level argument to specify the level of the multiindex.

    if you don't have a multiindex set, you can set it on your read.

    df = pd.read_csv('file.csv',sep='\s+',header=[0,1])
    
    df.loc[:,df.columns.isin(fruits,level=1)]
    
    print(df)
             One   two
             date apple
    2  2019-12-20     0
    3  2020-01-03    10
    

    df.columns.isin(fruits,level=1)
    array([ True,  True, False])