My DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': list('xxxxxxxxxxyyyyyyyyy'),
'b': list('1111222333112233444')
}
)
Expected output is a list of groups:
a b
0 x 1
1 x 1
2 x 1
3 x 1
4 x 2
5 x 2
6 x 2
a b
4 x 2
5 x 2
6 x 2
7 x 3
8 x 3
9 x 3
a b
10 y 1
11 y 1
12 y 2
13 y 2
a b
12 y 2
13 y 2
14 y 3
15 y 3
a b
14 y 3
15 y 3
16 y 4
17 y 4
18 y 4
Logic:
Grouping starts with df.groupby(['a', 'b'])
and then after that I want to join each group with its previous one which gives me the expected output.
Maybe the initial grouping that I mentioned is not necessary.
Note that in the expected output a
column cannot contain both x
and y
.
Honestly overlapping rows is not what I have used to do when using groupby
. So I don't know how to try to do it. I tried df.b.diff()
but It is not even close.
You can combine groupby
, itertools.pairwise
and concat
:
from itertools import pairwise
out = [pd.concat([a[1], b[1]]) for a, b in pairwise(df.groupby(['a', 'b']))]
Functional variant:
from itertools import pairwise
from operator import itemgetter
out = list(map(pd.concat, pairwise(map(itemgetter(1), df.groupby(['a', 'b'])))))
Note that you might need to use sort=False
in groupby if you want to keep the original order.
Output:
[ a b
0 x 1
1 x 1
2 x 1
3 x 1
4 x 2
5 x 2
6 x 2,
a b
4 x 2
5 x 2
6 x 2
7 x 3
8 x 3
9 x 3,
a b
7 x 3
8 x 3
9 x 3
10 y 1
11 y 1,
a b
10 y 1
11 y 1
12 y 2
13 y 2,
a b
12 y 2
13 y 2
14 y 3
15 y 3,
a b
14 y 3
15 y 3
16 y 4
17 y 4
18 y 4]