Search code examples
pythonpandasdataframerolling-computation

Pandas Rolling value_counts()


I am looking for a way in pandas to return the specific value_counts() over a rolling time window. I found the following question (pandas `value_counts` on a rolling time window), but it's not quite what I would like.

If I have a DataFrame that looks like this:

     symbol
0     apple
1     apple
2     apple
3     apple
4  cucumber
5  cucumber
6  cucumber

I would like to have this output:

     symbol  counts
0     apple       1
1     apple       2
2     apple       3
3     apple       4
4  cucumber       1
5  cucumber       2
6  cucumber       3

So far I'm using a for loop, which works, but is very time-consuming for bigger DataFrames:

for index in df.index:
    symbol = df.at[index,'symbol']
    df.at[index,'counts'] = df['symbol'].value_counts()[symbol]

Does somebody have a better and faster solution?


Solution

  • You can groupby "symbol" and use cumcount to get the numbering (have to add 1 since cumcount starts from 0):

    df['counts'] = df.groupby('symbol').cumcount() + 1
    

    Output:

         symbol  counts
    0     apple       1
    1     apple       2
    2     apple       3
    3     apple       4
    4  cucumber       1
    5  cucumber       2
    6  cucumber       3