I would like to perform a function on a set of numbers after a groupby, but this function only works when a certain condition is met. Is there a way to perform two different operations?
Say we want to apply the function 1/x after groupby. This of course cannot be done for x==0, but we just want to get 0 as a return value. Normally, this would look something like this
if x > 0: return 1/x else: return 0
However, doing
df.groupby(by = ["index"]).apply(lambda x: 0 if x == 0 else 1/x)
gives me an error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
.
The constructed data is as follows:
after a groupby, I am left with
df = | index| value1| | --- | -----| | a |0 | | b | 0.5| | c | 0.2|, where the indices are no longer callable.
I also have dataset
dg =
index | value2 | value3 |
---|---|---|
a | 1 | 5 |
a | 2 | 8 |
c | 3 | 7 |
c | 7 | 7 |
b | 5 | 6 |
b | 7 | 13 |
I join on the indices using
dh = pd.merge(dg, df, how = 'left', on = index)`.
Now i would like to apply the function
dh.groupby(by=index).apply(lambda x: (((x.value2/x.value3) - x.value1)**2).sum() / (x.value1 * (x.n.count()))
,
which can obviously not be performed when value1 is equal to zero. Putting the condition in as mentioned before gives me the aforementioned error. What do I do?
you can create a function that do this for you:-
def func(x):
if x['value1'].gt(0).all():
return 1/x['value1']
else:
return ((((x['value2']/x['value3'])-x['value1'])**2).sum()/x['value1']*x['value1'].count())
Now just use:-
dh.groupby(by = ["index"]).apply(func)
Output:-
index
a 0 inf
1 inf
b 4 2.0
5 2.0
c 2 5.0
3 5.0