Search code examples
pythonpandasdataframemultiple-columnsseries

Summing observations from column in pandas


Suppose I have a big Dataframe DS_df w/ column names year, dealamount and CCS among others. For every year, from 1985 until 2020, I need a separate panda series i.e. sum_2019. I need to sum the dealamount, if CCS does occur multiple times (if it occurs only once, it should just be added to the series) and the year matches:

    year    dealamount  CCS
0   2013    37,522,700  Albania_Azerbaijan
1   2013    37,522,700  Albania_Azerbaijan
2   2016    436,341,300 Albania_Greece
3   2019    763,189,200 Albania_Russia
4   2019    763,189,200 Albania_Russia
5   2019    763,189,200 Albania_Russia
6   2019    763,189,200 Albania_Russia
7   2017    150,931,000 Albania_Turkey
8   2016    275,293,750 Albania_Turkey
9   2009    258,328,000 Albania_Turkey
10  2019    153,452,000 Albania_Venezuela
11  2019    153,452,000 Albania_Venezuela
11  2017    153,452,000 Albania_Venezuela

So in this case, sum_2019 should be a panda series w/ the Index being CCS and the summed dealamount as "observation".

Albania_Russia 3,052,756,800
Albania_Venezuela 306,904

Likewise, for sum_2013:

Albania_Azerbaijan 75,045,400

Any help is greatly appreciated, as I need to this for quite a lot of data points and feel quite lost (really new to python) How would I go about properly automating this?

Thank you!!


Solution

  • # to avoid scientific notation (e notation)
    pd.set_option('display.float_format', lambda x: '%.d' % x) 
    
    # first filter by 'year' then group by 'CSS' and finally sum by 'dealamount'
    sum_2019 = df[df['year']==2019].groupby('CCS')['dealamount'].sum()
    
    print(sum_2019)
    CCS
    Albania_Russia      3052756800
    Albania_Venezuela    306904000
    Name: dealamount, dtype: float64