Let us consider I have a data frame named Iris with name, sepallength, sepalwidth, petalwidth and petallength as columns. I want to find the cumulative count of sepallength within a group.
My code:
iris['name', 'sepallength', iris.groupby('name').sort('sepallength').sepallength.count()].head(5)
But it is showing the wrong result, what I am missing?
Use cumcount
instead of count
, the previous one is for window function while the later one is for aggregation.
iris['name', 'sepallength', iris.groupby('name').sort('sepallength').sepallength.cumcount()].head(5)