I have this toy data set
df=pd.DataFrame({'user':['John','Steve','Steve','Steve','Jane','Jane','Jane','Jane','Alice','Alice','Alice'],
'days':[1,1,2,3,1,2,3,4,1,2,3]})
yielding
user days
John 1
Steve 1
Steve 2
Steve 3
Jane 1
Jane 2
Jane 3
Jane 4
Alice 1
Alice 2
Alice 3
I wish to count the exact number of user with only 1 days, only 2 and only 3 days.
Desired output
user days_count
1 1
3 2
4 1
I have tried code from this answer and from this answer, but non yielded above (or similar result)
Looks like a double value_counts
:
df['user'].value_counts(sort=False).value_counts(sort=False)
Output:
count
1 1
3 2
4 1
Name: count, dtype: int64
For the exact format:
out = (df['user'].value_counts(sort=False)
.value_counts(sort=False)
.rename_axis('user')
.reset_index(name='days_count')
)
Or:
from collections import Counter
c = Counter(Counter(df['user']).values())
out = pd.DataFrame({'user': c.keys(), 'day_count': c.values()})
Output:
user days_count
0 1 1
1 3 2
2 4 1