I have a Pandas DataFrame with a hierarchical index (MultiIndex). I created this DataFrame by grouping values for "cousub" and "year".
annualMed = df.groupby(["cousub", "year"])[["ratio", "sr_val_transfer"]].median().round(2)
print annualMed.head(8)
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
I would like to add an "overall" value in the "year" level that I could then populate with values based on a grouping of "cousub" alone, i.e., excluding "year". I would like the result to look like the following
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0
How can I add this new item to the "years" level of the MultiIndex?
If you want to just add these two columns explicitly, you could just specify all the MultiIndex levels with loc
.
df.loc[('Allen Park city', 'Overall'), :] = (0.50, 90000.)
df.loc[('Belleville city', 'Overall'), :] = (0.50, 135000.)
If you had a whole list of cities that you wanted to add this row for however, this would be a bit tedious. Maybe you could append
another DataFrame with the overall
values with a bit of index manipulation.
(df.reset_index()
.append(pd.DataFrame([['Allen Park city', 'Overall', 0.5, 90000.],
['Belleville city', 'Overall', 0.5, 135000.]],
columns=list(df.index.names) + list(df.columns)))
.set_index(df.index.names)
.sort_index())
Method 1 (smaller case)
>>> df.loc[('Allen Park city', 'Overall'), :] = (0.50, 90000.)
>>> df.loc[('Belleville city', 'Overall'), :] = (0.50, 135000.)
>>> df.sort_index()
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0
Method 2 (larger case)
>>> (df.reset_index()
.append(pd.DataFrame([['Allen Park city', 'Overall', 0.5, 90000.],
['Belleville city', 'Overall', 0.5, 135000.]],
columns=list(df.index.names) + list(df.columns)))
.set_index(df.index.names)
.sort_index())
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0