Search code examples
pythonpandascumsum

how to identify sequence order and cumsum the transactions?


I have the following dataframe:

df = pd.DataFrame({'id':[1,1,1,2,2,3,3,4,5,6,6,6,6,6,8,8,9,11,12,12],'letter':['A','A','Q','Q','Q','F','F','G','D','G','I','I','K','Q','E','S','S','I','I','F']})

My objective is to add another column tx that shows the followings: if it finds Q and there after an I - mark it as 1st transaction. Both Q and I must exists and must have the same comes as last_Q --> first_I.

so the end result should look like this:

enter image description here


Solution

  • I would use boolean arithmetic:

    # map True/False to Q/I
    m1 = df['letter'].map({'Q': True, 'I': False})
    # ffill the values
    m2 = m1.ffill().fillna(False)
    # only keep last Qs
    m3 = m1.shift(-1).ne(True)
    m4 = m2&m3
    
    df['tx'] = (m1&m3).cumsum().where(m4|m4.shift(), 0)
    

    Output:

        id letter  tx
    0    1      A   0
    1    1      A   0
    2    1      Q   0
    3    2      Q   0
    4    2      Q   1
    5    3      F   1
    6    3      F   1
    7    4      G   1
    8    5      D   1
    9    6      G   1
    10   6      I   1
    11   6      I   0
    12   6      K   0
    13   6      Q   2
    14   8      E   2
    15   8      S   2
    16   9      S   2
    17  11      I   2
    18  12      I   0
    19  12      F   0
    

    Intermediates:

        id letter  tx     m1     m2     m3     m4
    0    1      A   0    NaN  False   True  False
    1    1      A   0    NaN  False  False  False
    2    1      Q   0   True   True  False  False
    3    2      Q   0   True   True  False  False
    4    2      Q   1   True   True   True   True
    5    3      F   1    NaN   True   True   True
    6    3      F   1    NaN   True   True   True
    7    4      G   1    NaN   True   True   True
    8    5      D   1    NaN   True   True   True
    9    6      G   1    NaN   True   True   True
    10   6      I   1  False  False   True  False
    11   6      I   0  False  False   True  False
    12   6      K   0    NaN  False  False  False
    13   6      Q   2   True   True   True   True
    14   8      E   2    NaN   True   True   True
    15   8      S   2    NaN   True   True   True
    16   9      S   2    NaN   True   True   True
    17  11      I   2  False  False   True  False
    18  12      I   0  False  False   True  False
    19  12      F   0    NaN  False   True  False