I'm sure there's a nice way to do this using pandas, but I keep running in circles.
I want to sum the values in columns that have identical ID numbers.
Example df:
ID_number Billed Paid Balance
A100 300 20 280
A100 1000 1000 0
B9B6 1200 0 1200
B9B6 200 40 160
What I'm looking for in an output:
ID_number Billed Paid Balance
A100 1300 1020 280
B9B6 1400 40 1360
The caveat that's been giving me trouble is that the ID numbers contain both letters and numbers.
I've tried splitting the dataframe in two and changing the Billed, Paid, and Balance columns into int values so they can be summed, but I need to group them by the index number in order to sum them together. No matter how I've sliced and diced the dataframes, when I concatenate them together again, it comes back as the 'object' type and my object numbers get smashed together like strings instead of added together like int values should.
I also tried changing individual column datatypes, but I couldn't get that to work because some of the numbers are quite long decimal-wise, and I was returned this error: ValueError: invalid literal for int() with base 10:
I've been attempting to use the df.groupby(['ID_number'].agg('sum')
function for my aggregation.
Does anyone know how this can be done?
Here you go:
df.groupby('ID_number').sum()