Search code examples
pythonpandasaggregate

Aggregate function in pandas dataframe not working appropriately


I'm trying to sum a certain column based on a groupby of another column, I have the code right, but the output is wildly different. So I tried a simply min() function on that groupby, the output from this is also completely different from the expected output, did I do something wrong by chance?

Below are the images of the df displayed. I grouped it by lga_desc, and when tested for minimum value from those rows, I get the wrong output

|Taxable Income |lga_desc|

|300,000,450    |Alpine  |

|240,000        |Alpine  |

|700,000        |Alpine  |

|260,000,450    |Ararat  |

|469,000        |Ararat  |

|5,200,000      |Ararat  |


df = df.groupby('lga_desc')
df = df['Taxable income'].min()

output when applying min function:

lga_desc

Alpine           700,000 

Ararat           469,000 

these are the wrong outputs, from the given dataframe

thank you for the help!

Update: After careful checking on my code again, apparently when I imported this file, all numbers became strings. So a lesson, don't forget to make sure your numbers are actual numbers! not strings :)


Solution

  • You need to convert the data type to int first:

    df['Taxable Income'] = df['Taxable Income'].str.replace(',', '').astype(int)
    result = df.groupby('lga_desc')['Taxable Income'].min().reset_index()
    

    OUTPUT:

      lga_desc  Taxable Income
    0  Alpine            240000
    1  Ararat            469000