python python-3.x pandas dataframe summary

Slicing Pandas Columns to Obtain Summary Statistics

I have a dataframe that looks similar to the following:

ColA  ColB  Year  ...
=====================
1     2     2007
2     5     2007
3     4     2007
4     3     2007
5     2     2008
6     1     2008
7     0     2008
8     9     2008
...

I am using dat[['ColA', 'ColB']].describe(). When I do this, as expected, it displays summary statistics for both columns over all years. I would like to have summary statistics for each column by year. In the example above, I would have 4 columns of statistics (1 for ColA in 2007, 1 for ColA in 2008, 1 for ColB in 2007, and 1 for ColB in 2008). Is there a way to extend the capabilities of pd.describe() to accommodate this?

Solution

you can group by year before calling describe :

df_example = pd.DataFrame({"colA": [1, 2, 3, 4, 5, 6, 7, 8],
                           "Year": [2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008]})
des = df_example.groupby("Year").describe()
print(des)

 colA                                          
     count mean       std  min   25%  50%   75%  max
Year                                                
2007   4.0  2.5  1.290994  1.0  1.75  2.5  3.25  4.0
2008   4.0  6.5  1.290994  5.0  5.75  6.5  7.25  8.0