Search code examples
pythonpandasgroup-by

How to Get Proper Count of Values in Python


I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:

 ID       Job_Title        reports  
 1        Sales Manager       2
 1        Sales Manager       2
 2        Tech Support        0
 3        Tech Support        1
 3        Tech Support        1
 4        Sales Lead          4
 4        Sales Lead          4
 5        Sales Manager       5
 6        Tech Support        2

I would like to get a accurate count of the reports by position. Something Like this:

 Job_Title      reports    
Sales Manager     7
Sales Lead        4
Tech Support      3

So far what I have is this:

 df.groupby('Job_Title')['reports'].count().sort_values(ascending = False) 

And this is what I am getting:

Job_Title         reports
Tech Support         4
Sales Manager        3
Sales Lead           2

Solution

  • To get the expected result, you have to drop duplicates by ID and Job_Title before grouping by Job_Title and sum reports values:

    >>> (df.drop_duplicates(['ID', 'Job_Title'])
           .groupby('Job_Title', as_index=False)['reports'].sum()
           .sort_values(by='reports', ascending=False, ignore_index=True))
    
           Job_Title  reports
    0  Sales Manager        7
    1     Sales Lead        4
    2   Tech Support        3