I have a dataset, data1
. I am trying to use indexing slicing based on input
where data1 =
stats
gender year
women 2003 cellphone use
2007 height
2007 cigarette use
2008 weight
2009 cellphone use
2015 cigarette use
2018 weight
2020 height
Here is my attempt at index slicing:
isvalid_yr = False
while not isvalid_yr:
year_input = int(input("Input the year you want to compare data from: "))
if year_input in data1.index.get_level_values('year')
idx = pd.IndexSlice
isvalid_yr = True
new_data1 = data1.loc(axis = 0)[idx[year_input:year_input], idx[:]]
else:
isvalid_yr = False
try:
if isvalid_yr ==True:
pass
else:
raise ValueError("Year not in data!")
except ValueError as err:
print("Year not in data!")
It gives me this output which I do not want.
Empty DataFrame
Columns: [stats]
Index: []
my final desired output that I want to achieve would look like this
Input the year you want to compare data from: 2007
result of new_data1 =
stats
gender year
women
2007 height
2007 cigarette use
Use xs
to grab a cross-section of the DataFrame:
res = df.xs(2007, axis=0, level='year', drop_level=False)
res
:
stats
gender year
women 2007 height
2007 cigarette use
With user input:
while True:
try:
year_input = int(
input("Input the year you want to compare data from: ")
)
res = df.xs(year_input, axis=0, level='year', drop_level=False)
break
except KeyError:
print("Year not in data!")
except ValueError:
print("Please enter a valid year")
df
used:
df = pd.DataFrame({
'gender': ['women', 'women', 'women', 'women', 'women', 'women', 'women',
'women'],
'year': [2003, 2007, 2007, 2008, 2009, 2015, 2018, 2020],
'stats': ['cellphone use', 'height', 'cigarette use', 'weight',
'cellphone use', 'cigarette use', 'weight', 'height']
}).set_index(['gender', 'year'])
df
:
stats
gender year
women 2003 cellphone use
2007 height
2007 cigarette use
2008 weight
2009 cellphone use
2015 cigarette use
2018 weight
2020 height