Search code examples
pythonpandasdata-manipulation

How to round calculations with pandas


I know how to simply round the column in pandas (link), however, my problem is how can I round and do calculation at the same time in pandas.

df['age_new'] = df['age'].apply(lambda x: round(x['age'] * 0.024319744084, 0.000000000001))

TypeError: 'float' object is not subscriptable

Is there any way to do this?


Solution

    • .apply is not vectorized.
      • When using .apply on a pandas.Series, like 'age', the lambda variable, x is the 'age' column, so the correct syntax is round(x * 0.0243, 4)
      • The ndigits parameter of round, requires an int, not a float.
    • It is faster to use vectorized methods, like .mul, and then .round.
      • In this case, with 1000 rows, the vectorized method is 4 times faster than using .apply.
    import pandas as pd
    import numpy as np
    
    # test data
    np.random.seed(365)
    df = pd.DataFrame({'age': np.random.randint(110, size=(1000))})
    
    %%timeit
    df.age.mul(0.024319744084).round(5)
    [out]:
    212 µs ± 3.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %%timeit
    (df['age'] * 0.024319744084).round(5)
    [out]:
    211 µs ± 9.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %%timeit
    df.age.apply(lambda x: round(x * 0.024319744084, 5))
    [out]:
    845 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)