Search code examples
pythonpandastime-seriesgroupingcategorization

Python: How to group data between occurrences of X


I have a dataframe already sorted by time like this:

| type | time |
| :——- | :——- |
| B    | t1   |
| C    | t2   |
| A    | t3   |
| D    | t4   |
| C    | t5   |
| B    | t6   |
| A    | t7   |
| B    | t8   |
| D    | t9   |
| A    | t10  |

I want to be chop the data frame based on the occurrences of A and eventually get the results of:

group 1: {B:t1, C:t2}
group 2: {A:t3, D:t4, C:t5, B:t6}
group 3: {A:t7, B:t8, D:t9}
group 4: {A:t10} 


Solution

  • UseGroupBy.apply with lambda function for dictionaries by helper Series created compare type by A with cumulative sum by Series.cumsum:

    f = lambda x: dict(zip(x['type'], x['time']))
    df1 = (df.groupby(df['type'].eq('A').cumsum().add(1).rename('group'))
             .apply(f)
             .add_prefix('group ')
             .reset_index(name='new'))
    print (df1)
    
         group                                           new
    0  group 1                        {'B': 't1', 'C': 't2'}
    1  group 2  {'A': 't3', 'D': 't4', 'C': 't5', 'B': 't6'}
    2  group 3             {'A': 't7', 'B': 't8', 'D': 't9'}
    3  group 4                                  {'A': 't10'}