Search code examples
pythonpandasfunctiondataframeapply

Apply custom function over multiple columns in pandas


I am having trouble "applying" a custom function in Pandas. When I test the function, directly passing the values it works and correctly returns the response, e.g. feez(800, "4 Plan"), returns 3200. However, when I attempt to pass the column values this way I receive the error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

def feez(rides, plan):
    pmt4 = 200
    inc4 = 50  # number rides included
    min_rate4 = 4 

    if plan == "4 Plan":
        if rides > inc4:
            fee = ((rides - inc4) * min_rate4) + pmt4 
        else:
            fee = pmt4
        return fee
    else:
        return 0.1

df['fee'].apply(feez(df.total_rides, df.plan_name))

I am a newbie and suspect my syntax is poorly written.


Solution

  • apply is meant to work on one row at a time, so passing the entire column as you are doing so will not work. In these instances, it's best to use a lambda.

    df['fee'] = df.apply(lambda x: feez(x['total_rides'], x['plan_name']), axis=1)
    

    However, there are possibly faster ways to do this. One way is using np.vectorize. The other is using np.where.

    Option 1
    np.vectorize

    v = np.vectorize(feez)
    df['fee'] = v(df.total_rides, df.plan_name)
    

    Option 2
    Nested np.where

    df['fee'] = np.where(
        df.plan_name == "4 Plan", 
        np.where(
            df.total_rides > inc4,
            (df.total_rides - inc4) * min_rate4) + pmt4,
            pmt4
        ), 
        0.1
    )