Search code examples
pythonpandasdataframeaggregateconcatenation

Summing columns with same string ID in a pandas dataframe


I'm sure there's a nice way to do this using pandas, but I keep running in circles.

I want to sum the values in columns that have identical ID numbers.

Example df:

ID_number  Billed  Paid  Balance
A100       300     20    280
A100       1000    1000  0
B9B6       1200    0     1200
B9B6       200     40    160

What I'm looking for in an output:

ID_number  Billed  Paid  Balance
A100       1300    1020  280
B9B6       1400    40    1360

The caveat that's been giving me trouble is that the ID numbers contain both letters and numbers.

I've tried splitting the dataframe in two and changing the Billed, Paid, and Balance columns into int values so they can be summed, but I need to group them by the index number in order to sum them together. No matter how I've sliced and diced the dataframes, when I concatenate them together again, it comes back as the 'object' type and my object numbers get smashed together like strings instead of added together like int values should.

I also tried changing individual column datatypes, but I couldn't get that to work because some of the numbers are quite long decimal-wise, and I was returned this error: ValueError: invalid literal for int() with base 10:

I've been attempting to use the df.groupby(['ID_number'].agg('sum') function for my aggregation.

Does anyone know how this can be done?


Solution

  • Here you go:

    df.groupby('ID_number').sum()