I tried asking this question earlier and miscommunicated what I am having trouble with. I have a dataset in python that I am using numpy and pandas in and I am trying to get a count of reports by job type. There are are 100+ titles so I will shorten it for an example:
ID Job_Title reports
1 Sales Manager 2
1 Sales Manager 2
2 Tech Support 0
3 Tech Support 1
3 Tech Support 1
4 Sales Lead 4
4 Sales Lead 4
5 Sales Manager 5
6 Tech Support 2
I would like to get a accurate count of the reports by position. Something Like this:
Job_Title reports
Sales Manager 7
Sales Lead 4
Tech Support 3
So far what I have is this:
df.groupby('Job_Title')['reports'].count().sort_values(ascending = False)
And this is what I am getting:
Job_Title reports
Tech Support 4
Sales Manager 3
Sales Lead 2
To get the expected result, you have to drop duplicates by ID
and Job_Title
before grouping by Job_Title
and sum reports
values:
>>> (df.drop_duplicates(['ID', 'Job_Title'])
.groupby('Job_Title', as_index=False)['reports'].sum()
.sort_values(by='reports', ascending=False, ignore_index=True))
Job_Title reports
0 Sales Manager 7
1 Sales Lead 4
2 Tech Support 3