I am looking for a way in pandas to return the specific value_counts()
over a rolling time window. I found the following question (pandas `value_counts` on a rolling time window), but it's not quite what I would like.
If I have a DataFrame that looks like this:
symbol
0 apple
1 apple
2 apple
3 apple
4 cucumber
5 cucumber
6 cucumber
I would like to have this output:
symbol counts
0 apple 1
1 apple 2
2 apple 3
3 apple 4
4 cucumber 1
5 cucumber 2
6 cucumber 3
So far I'm using a for loop, which works, but is very time-consuming for bigger DataFrames:
for index in df.index:
symbol = df.at[index,'symbol']
df.at[index,'counts'] = df['symbol'].value_counts()[symbol]
Does somebody have a better and faster solution?
You can groupby
"symbol" and use cumcount
to get the numbering (have to add 1 since cumcount
starts from 0):
df['counts'] = df.groupby('symbol').cumcount() + 1
Output:
symbol counts
0 apple 1
1 apple 2
2 apple 3
3 apple 4
4 cucumber 1
5 cucumber 2
6 cucumber 3