Search code examples
pythonpandascountcumulative-sum

Pandas groupby count unique non cumulative


I have this toy data set

df=pd.DataFrame({'user':['John','Steve','Steve','Steve','Jane','Jane','Jane','Jane','Alice','Alice','Alice'],
                  'days':[1,1,2,3,1,2,3,4,1,2,3]})

yielding

user  days
John    1
Steve   1
Steve   2
Steve   3
Jane    1
Jane    2
Jane    3
Jane    4
Alice   1
Alice   2
Alice   3

I wish to count the exact number of user with only 1 days, only 2 and only 3 days.

Desired output

user   days_count
1           1
3           2
4           1

I have tried code from this answer and from this answer, but non yielded above (or similar result)


Solution

  • Looks like a double value_counts:

    df['user'].value_counts(sort=False).value_counts(sort=False)
    

    Output:

    count
    1    1
    3    2
    4    1
    Name: count, dtype: int64
    

    For the exact format:

    out = (df['user'].value_counts(sort=False)
           .value_counts(sort=False)
           .rename_axis('user')
           .reset_index(name='days_count')
          )
    

    Or:

    from collections import Counter
    
    c = Counter(Counter(df['user']).values())
    
    out = pd.DataFrame({'user': c.keys(), 'day_count': c.values()})
    

    Output:

       user  days_count
    0     1           1
    1     3           2
    2     4           1