I have a python script that reads in data from a csv file
The code runs fine, but everytime it runs I get this Deprecation message:
DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
the warning stems from this piece of code:
fprice = df.groupby(['StartDate', 'Commodity', 'DealType']).apply(lambda group: -(group['MTMValue'].sum() - (group['FixedPriceStrike'] * group['Quantity']).sum()) / group['Quantity'].sum()).reset_index(name='FloatPrice')
to my understanding, I am performing the apply function on my groupings,but then I am disregarding the groupings and not using them anymore to be apart of my dataframe. I am confused about the directions to silence the warning
here is some sample data that this code uses:
TradeID TradeDate Commodity StartDate ExpiryDate FixedPrice Quantity MTMValue
-------- ---------- --------- --------- ---------- ---------- -------- ---------
aaa 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00
bbb 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00
ccc 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00
and here is the expected output from this data:
TradeID TradeDate Commodity StartDate ExpiryDate FixedPrice Quantity MTMValue FloatPrice
-------- ---------- --------- --------- ---------- ---------- -------- --------- ----------
aaa 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00 0
bbb 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00 0
ccc 01/01/2024 (com1,com2) 01/01/2024 01/01/2024 10 10 100.00 0
include_groups
parameterThe include_groups
parameter of DataFrameGroupBy.apply is new in pandas version 2.2.0. It is basically a transition period (2.2.0 -> 3.0) parameter added to help communicating a changing behavior (with warnings) and to tackle pandas Issue 7155. In most cases you should be able to just set it to False
to silent the warning (see below).
Let's say you have a pandas DataFrame df
and a dummy function myfunc
for apply, and you want to
'c'
myfunc
on each group>>> df
a value c
0 foo 10 cat1
1 bar 20 cat2
2 baz 30 cat1
3 quux 40 cat2
>>> def myfunc(x):
print(x, '\n')
include_groups
parameter)'c'
is included in the DataFrameGroupBy
>>> df.groupby('c').apply(myfunc)
a value c
0 foo 10 cat1
2 baz 30 cat1
a value c
1 bar 20 cat2
3 quux 40 cat2
Now as mentioned in Issue 7155, keeping the grouping column c
in the dataframe passed to apply
is unwanted behavior. Most people will not expect c
to be present here. The answer of bue has actually an example how this could lead to bugs; apply on np.mean
and expect there be less columns (causes a bug if your grouping column is numerical).
'c'
:>>> df.groupby('c').apply(myfunc, include_groups=False)
a value
0 foo 10
2 baz 30
a value
1 bar 20
3 quux 40
include_groups
at allYou may also skip the need for using the include_groups
parameter at all by explicitly giving the list of the columns (as pointed out by the warning itself; "..or explicitly select the grouping columns after groupby to silence this warning..", and Cahit in their answer), like this:
>>> df.groupby('c')[['a', 'value', 'c']].apply(myfunc)
a value c
0 foo 10 cat1
2 baz 30 cat1
a value c
1 bar 20 cat2
3 quux 40 cat2
Empty DataFrame
Columns: []
Index: []
You may also set the groupby column to the index, as pointed out by Stefan in the comments.
>>> df.set_index('c').groupby(level='c').apply(myfunc)
a value
c
cat1 foo 10
cat1 baz 30
a value
c
cat2 bar 20
cat2 quux 40
Empty DataFrame
Columns: []
Index: []
Your grouping columns are
['StartDate', 'Commodity', 'DealType']
In the apply function you use the following columns:
['MTMValue', 'FixedPriceStrike', 'Quantity']
i.e., you do not need any of the grouping columns in your apply, and therefore you can use include_groups=False
which also removes the warning.
fprice = df.groupby(['StartDate', 'Commodity', 'DealType']).apply(lambda group: -(group['MTMValue'].sum() - (group['FixedPriceStrike'] * group['Quantity']).sum()) / group['Quantity'].sum(), include_groups=False).reset_index(name='FloatPrice')