I'm trying to sum a certain column based on a groupby of another column, I have the code right, but the output is wildly different. So I tried a simply min() function on that groupby, the output from this is also completely different from the expected output, did I do something wrong by chance?
Below are the images of the df displayed. I grouped it by lga_desc, and when tested for minimum value from those rows, I get the wrong output
|Taxable Income |lga_desc|
|300,000,450 |Alpine |
|240,000 |Alpine |
|700,000 |Alpine |
|260,000,450 |Ararat |
|469,000 |Ararat |
|5,200,000 |Ararat |
df = df.groupby('lga_desc')
df = df['Taxable income'].min()
output when applying min function:
lga_desc
Alpine 700,000
Ararat 469,000
these are the wrong outputs, from the given dataframe
thank you for the help!
Update: After careful checking on my code again, apparently when I imported this file, all numbers became strings. So a lesson, don't forget to make sure your numbers are actual numbers! not strings :)
You need to convert the data type to int first:
df['Taxable Income'] = df['Taxable Income'].str.replace(',', '').astype(int)
result = df.groupby('lga_desc')['Taxable Income'].min().reset_index()
OUTPUT:
lga_desc Taxable Income
0 Alpine 240000
1 Ararat 469000