Suppose I have a big Dataframe DS_df w/ column names year, dealamount and CCS among others. For every year, from 1985 until 2020, I need a separate panda series i.e. sum_2019. I need to sum the dealamount, if CCS does occur multiple times (if it occurs only once, it should just be added to the series) and the year matches:
year dealamount CCS
0 2013 37,522,700 Albania_Azerbaijan
1 2013 37,522,700 Albania_Azerbaijan
2 2016 436,341,300 Albania_Greece
3 2019 763,189,200 Albania_Russia
4 2019 763,189,200 Albania_Russia
5 2019 763,189,200 Albania_Russia
6 2019 763,189,200 Albania_Russia
7 2017 150,931,000 Albania_Turkey
8 2016 275,293,750 Albania_Turkey
9 2009 258,328,000 Albania_Turkey
10 2019 153,452,000 Albania_Venezuela
11 2019 153,452,000 Albania_Venezuela
11 2017 153,452,000 Albania_Venezuela
So in this case, sum_2019 should be a panda series w/ the Index being CCS and the summed dealamount as "observation".
Albania_Russia 3,052,756,800
Albania_Venezuela 306,904
Likewise, for sum_2013:
Albania_Azerbaijan 75,045,400
Any help is greatly appreciated, as I need to this for quite a lot of data points and feel quite lost (really new to python) How would I go about properly automating this?
Thank you!!
# to avoid scientific notation (e notation)
pd.set_option('display.float_format', lambda x: '%.d' % x)
# first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()
print(sum_2019)
CCS
Albania_Russia 3052756800
Albania_Venezuela 306904000
Name: dealamount, dtype: float64