Search code examples
pythonpandasnancalculationdivide

when doing dividing operations in pandas, i always get 'NAN' results. how can I solve the problem?


I want to make column 'ratio' that is the result after each value of the column 'amount' divides the last value of the column 'amount'. the data type of amount column is int64. After changing the data type to float, I also got the same 'NAN' value.

enter image description here


Solution

  • When you do any math on several data frames or sequences, Pandas aligns on indexes and columns by default. tail(1) returns not a single value (scalar) but a sequence with the last index of the original data. When you divide the column on the obtained sequence, data are merged on indexes and then divided on corresponding values. Since tail contains only the value with the last index, the merge ends up with nan values as corresponding divisors for all dividends except the last one. That's why you got nan everywhere except at the last position.

    To avoid this behavior, pass the divisor either as a number or a numpy.array. In this case, it can be

    dt['amount'] / dt['amount'].tail(1).values    # divide on a numpy.array
    dt['amount'] / dt['amount'].iloc[-1]          # divide on a number