Search code examples
pythonpandasindexinghierarchical

pandas hierarchical index slicing based on year index


I have a dataset, data1. I am trying to use indexing slicing based on input

where data1 =

                  stats
gender  year    
        
women   2003    cellphone use
        2007    height
        2007    cigarette use
        2008    weight
        2009    cellphone use
        2015    cigarette use
        2018    weight
        2020    height

Here is my attempt at index slicing:

 isvalid_yr = False
 while not isvalid_yr:
     year_input = int(input("Input the year you want to compare data from: "))
     if year_input in data1.index.get_level_values('year') 
         idx = pd.IndexSlice
         isvalid_yr = True
         new_data1 = data1.loc(axis = 0)[idx[year_input:year_input], idx[:]]
     else:
          isvalid_yr = False
     try:
         if isvalid_yr ==True:
             pass
         else:
             raise ValueError("Year not in data!")
         except ValueError as err:
             print("Year not in data!")

It gives me this output which I do not want.

Empty DataFrame
Columns: [stats]
Index: []

my final desired output that I want to achieve would look like this

Input the year you want to compare data from: 2007

result of new_data1 =

                  stats
gender  year    

women   
        2007    height
        2007    cigarette use

Solution

  • Use xs to grab a cross-section of the DataFrame:

    res = df.xs(2007, axis=0, level='year', drop_level=False)
    

    res:

                         stats
    gender year               
    women  2007         height
           2007  cigarette use
    

    With user input:

    while True:
        try:
            year_input = int(
                input("Input the year you want to compare data from: ")
            )
            res = df.xs(year_input, axis=0, level='year', drop_level=False)
            break
        except KeyError:
            print("Year not in data!")
        except ValueError:
            print("Please enter a valid year")
    

    df used:

    df = pd.DataFrame({
        'gender': ['women', 'women', 'women', 'women', 'women', 'women', 'women',
                   'women'],
        'year': [2003, 2007, 2007, 2008, 2009, 2015, 2018, 2020],
        'stats': ['cellphone use', 'height', 'cigarette use', 'weight',
                  'cellphone use', 'cigarette use', 'weight', 'height']
    }).set_index(['gender', 'year'])
    

    df:

                         stats
    gender year               
    women  2003  cellphone use
           2007         height
           2007  cigarette use
           2008         weight
           2009  cellphone use
           2015  cigarette use
           2018         weight
           2020         height