Search code examples
pythonpandasdataframe

How can I use groupby in a way that each group is grouped with the previous overlapping group?


My DataFrame:

import pandas as pd

df = pd.DataFrame(
    {
        'a': list('xxxxxxxxxxyyyyyyyyy'),
        'b': list('1111222333112233444')
    }
)

Expected output is a list of groups:

  a  b
0   x  1
1   x  1
2   x  1
3   x  1
4   x  2
5   x  2
6   x  2

    a  b
4   x  2
5   x  2
6   x  2
7   x  3
8   x  3
9   x  3

    a  b
10  y  1
11  y  1
12  y  2
13  y  2

    a  b
12  y  2
13  y  2
14  y  3
15  y  3

    a  b
14  y  3
15  y  3
16  y  4
17  y  4
18  y  4

Logic:

Grouping starts with df.groupby(['a', 'b']) and then after that I want to join each group with its previous one which gives me the expected output.

Maybe the initial grouping that I mentioned is not necessary.

Note that in the expected output a column cannot contain both x and y.

Honestly overlapping rows is not what I have used to do when using groupby. So I don't know how to try to do it. I tried df.b.diff() but It is not even close.


Solution

  • You can combine groupby, itertools.pairwise and concat:

    from itertools import pairwise
    
    out = [pd.concat([a[1], b[1]]) for a, b in pairwise(df.groupby(['a', 'b']))]
    

    Functional variant:

    from itertools import pairwise
    from operator import itemgetter
    
    out = list(map(pd.concat, pairwise(map(itemgetter(1), df.groupby(['a', 'b'])))))
    

    Note that you might need to use sort=False in groupby if you want to keep the original order.

    Output:

    [   a  b
     0  x  1
     1  x  1
     2  x  1
     3  x  1
     4  x  2
     5  x  2
     6  x  2,
        a  b
     4  x  2
     5  x  2
     6  x  2
     7  x  3
     8  x  3
     9  x  3,
         a  b
     7   x  3
     8   x  3
     9   x  3
     10  y  1
     11  y  1,
         a  b
     10  y  1
     11  y  1
     12  y  2
     13  y  2,
         a  b
     12  y  2
     13  y  2
     14  y  3
     15  y  3,
         a  b
     14  y  3
     15  y  3
     16  y  4
     17  y  4
     18  y  4]