Search code examples
python-3.xpandasdataframebuffer-overflow

Pandas Dataframe sum result is wrong


I make program using pandas and openpyxl to manipulate excel files, series of data is:

l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]

excel data

deposit_sum = sep_df[sep_kward][deposit].dropna().astype(int).sum()

The result has to be 16188926398

But 11200862491 is the result of above code. Only one of file occurs that error. What do you think is the problem?


Solution

  • Don't typecast values to int after dropping NaN's convert the value to int64 because this 2840580259.0 is out of range for integer value:

    deposit_sum =df[0].dropna().astype('int64').sum()
    #deposit_sum =sep_df[sep_kward][deposit].dropna().astype('int64').sum()
    

    output of deposit_sum:

    16188926398
    

    Sample dataframe used:

    NA=float('NaN')
    l=[466629703, NA, 527821349, NA,734823364, NA,1667241489, NA,502673377, NA,491316417, NA,505520276, NA,2840580259, NA,1399526794, NA,468709318, NA,425220764, NA,409771252, NA,643692418, NA,1193809483, NA,353829950, NA,424820400, NA,406999623, NA,389293014, NA,1168972722, NA,420654309, NA,390431735, NA,356588382, NA]
    df=pd.DataFrame(l)