Search code examples
pythonpandasconditional-statementspandas-groupbypandas-apply

Applying function based on a condtion dataframe after groupby


I would like to perform a function on a set of numbers after a groupby, but this function only works when a certain condition is met. Is there a way to perform two different operations?

Say we want to apply the function 1/x after groupby. This of course cannot be done for x==0, but we just want to get 0 as a return value. Normally, this would look something like this

if x > 0: return 1/x else: return 0

However, doing

df.groupby(by = ["index"]).apply(lambda x: 0 if x == 0 else 1/x)

gives me an error message: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The constructed data is as follows:

after a groupby, I am left with

df = | index| value1| | --- | -----| | a |0 | | b | 0.5| | c | 0.2|, where the indices are no longer callable.

I also have dataset

dg =

index value2 value3
a 1 5
a 2 8
c 3 7
c 7 7
b 5 6
b 7 13

I join on the indices using

dh = pd.merge(dg, df, how = 'left', on = index)`.

Now i would like to apply the function

dh.groupby(by=index).apply(lambda x: (((x.value2/x.value3) - x.value1)**2).sum() / (x.value1 * (x.n.count())),

which can obviously not be performed when value1 is equal to zero. Putting the condition in as mentioned before gives me the aforementioned error. What do I do?


Solution

  • you can create a function that do this for you:-

    def func(x):
        if x['value1'].gt(0).all():
            return 1/x['value1']
        else:
            return ((((x['value2']/x['value3'])-x['value1'])**2).sum()/x['value1']*x['value1'].count())
    

    Now just use:-

    dh.groupby(by = ["index"]).apply(func)
    

    Output:-

    index   
    a      0    inf
           1    inf
    b      4    2.0
           5    2.0
    c      2    5.0
           3    5.0