Search code examples
pythonpandaslisttime-series

How to mark values while keep count each time we see a value?


Does anyone know a way to count values while annotating them? For example.

We have a reparative state with 3 different values. A, B, C. We want to count how many times we saw state A but values A are different length of A's in a column. We have Column A and want to get column B. So we have time series A, B, C and we want to count time wise that A is 1 then B is 1 and C is 1 but when we again see A is 2 and B is 2 and C is 2 and the A again is 3 and so on... (The order is always the same first A then B then C and again A...)

Any ideas?

Column A Column B
A 1
A 1
A 1
B 1
B 1
C 1
A 2
B 2
C 2
A 3
A 3
A 3
A 3
A 3
A 3
B 3
B 3
B 3
C 3
C 3

Tried to get a loop but don't know how to count the state.


Solution

  • Try:

    from itertools import count
    from collections import defaultdict
    
    d = defaultdict(count)
    
    df['Column B new'] = df.groupby((df['Column A'] != df['Column A'].shift()).cumsum())['Column A'].transform(lambda x: next(d[x.iat[0]]) + 1)
    print(df)
    

    Prints:

       Column A  Column B  Column B new
    0         A         1             1
    1         A         1             1
    2         A         1             1
    3         B         1             1
    4         B         1             1
    5         C         1             1
    6         A         2             2
    7         B         2             2
    8         C         2             2
    9         A         3             3
    10        A         3             3
    11        A         3             3
    12        A         3             3
    13        A         3             3
    14        A         3             3
    15        B         3             3
    16        B         3             3
    17        B         3             3
    18        C         3             3
    19        C         3             3