I have got a bug with:
where x
is a pandas.core.groupby.groupby.DataFrameGroupBy
I tried with the solution proposed in this page. So I did this:
x.field.apply(lambda x: x.rolling(window=5,min_periods=1).mean())
Contrary to the webpage introduced above, I still get the same bug.
| machin | machin | truc | a column of series |
| machin1 | machin1 | truc1 | 1 |
| | | truc2 | 2 |
| | | truc3 | 3 |
| | | truc4 | 4 |
| machin2 | machin2 | truc1 | 100 |
| | | truc2 | 99 |
| | | truc3 | 98 |
as you can see, the column index 'machin' is duplicated while before using the rolling method it appears correctly.
For instance let's write x.field.apply(lambda x: x+1)
. It returns:
| machin | truc | a column of series |
| machin1 | truc1 | 2 |
| | truc2 | 3 |
| | truc3 | 4 |
| | truc4 | 5 |
| machin2 | truc1 | 101 |
| | truc2 | 100 |
| | truc3 | 99 |
So no duplication, no bug. It shows that's really an issue from the rolling()
Here some code to help you to reproduce my computation
import pandas as pd
#creation of records
'a column':[1,2,3,4]},
'a column':[100,99,98]}]
#creation of pandas dataframe
#creation of multi-index
#creation of a groupby object
#rolling computation. Note that to do x.field or x['field'] is the same, and gives same bug as I checked.
x['a column'].rolling(window=5,min_periods=1).mean()
#rolling with apply and lambda, gives same bug
x['a column'].apply(lambda x:x.rolling(window=5,min_periods=1).mean())
#making apply and lambda alone gives no bug
a=x['a column'].apply(lambda x: x+1)
Others solutions I tried
I tried to reset the index of the series, doc here.
it raises an exception: ValueError: cannot insert machin, already exists
while you can see 'machin' in a names' value in the multiindex:
MultiIndex(levels=[['machin1', 'machin2'], ['machin1', 'machin2'], ['truc1', 'truc2', 'truc3', 'truc4']],
labels=[[0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2]],
names=['machin', 'machin', 'truc'])
I tried with drop too, doc here:
it raises an exception: KeyError: 'machin'
or KeyError: 0
My versions
Python 3.7.1 (default, Dec 14 2018, 19:28:38) in an anaconda environment, even in terminal: [GCC 7.3.0] :: Anaconda, Inc. on linux
pandas 0.23.4
Use the group_keys
argument of groupby
df.groupby('machin', group_keys=False).rolling(window=5, min_periods=1).mean()
Alternatively, you can drop the 0th level, which rolling inserts, with reset_index
df.groupby('machin').rolling(window=5, min_periods=1).mean().reset_index(level=0, drop=True)
a column
machin truc
machin1 truc1 1.0
truc2 1.5
truc3 2.0
truc4 2.5
machin2 truc1 100.0
truc2 99.5
truc3 99.0