Search code examples
pythonpandasdataframeindexingmulti-index

Select named index level from pandas DataFrame MultiIndex


I created a dataframe as :

df1 = pandas.read_csv(ifile_name,  header=None,  sep=r"\s+",  usecols=[0,1,2,3,4],
                              index_col=[0,1,2], names=["year", "month", "day", "something1", "something2"])

now I would like to create another dataframe where year>2008. Hence I tried :

df2 = df1[df1.year>2008]

But getting error :

AttributeError: 'DataFrame' object has no attribute 'year'

I guess, it is not seeing the "year" among the columns because I defined it within index. But how can I get data based on year>2008 in that case?


Solution

  • You are correct that year is an index rather than a column. One solution is to use pd.DataFrame.query, which lets you use index names directly:

    df = pd.DataFrame({'year': [2005, 2010, 2015], 'value': [1, 2, 3]})
    df = df.set_index('year')
    
    res = df.query('year > 2008')
    
    print(res)
    
          value
    year       
    2010      2
    2015      3