Facing issue in grouping the data

I am trying to solve below problem:

Write a program to access the university graduate data from data.gov.sg. The link to the web page is as follows:

https://data.gov.sg/api/action/datastore_search?resource_id=eb8b932c-503c-41e7-b513-114cffbe2338

Using this data, compute for each of the years, which are the top 3 course type for males and females.

A sample of the expected output is as follows:

1993
Males: Engineering Sciences | Humanities & Social Sciences | Natural, Physical & Mathematical Sciences
Females: Humanities & Social Sciences | Business & Administration | Natural, Physical & Mathematical Sciences

but not sure - how to group the data and get top 3 courses year wise. I have worked till the last step, except the grouping part

My Code:

import requests
import pprint

pp = pprint.PrettyPrinter(indent=4)

res = requests.get("https://data.gov.sg/api/action/datastore_search?resource_id=eb8b932c-503c-41e7-b513-114cffbe2338&limit=100")

obj = res.json()

print(pp.pprint(obj))

for record in obj["result"]["records"]:
    print(record["year"], ' | ', record["sex"] , ' ', record ["type_of_course"])

Solution

Create a Pandas DataFrame from the records key of your result dict from the response JSON and groupby "year" and "sex" after sort_values on "no_of_graduates"

Example code:

import pandas as pd
import requests

res = requests.get("https://data.gov.sg/api/action/datastore_search?resource_id=eb8b932c-503c-41e7-b513-114cffbe2338&limit=100").json()['result']

df = pd.DataFrame(res['records'])

def convert_to_int(s: str) -> int:
    try:
        return int(s)
    except ValueError:
        return 0

df['no_of_graduates'] = df['no_of_graduates'].apply(convert_to_int)

males = df[df['sex'] == 'Males']
females = df.drop(males.index)

males = males.sort_values('no_of_graduates', ascending=False).groupby('year').head(3).sort_values('year')
females = females.sort_values('no_of_graduates', ascending=False).groupby('year').head(3).sort_values('year')

result = pd.concat([males, females])

Result DF

    _id sex no_of_graduates type_of_course  year
13  14  Males   1496    Engineering Sciences    1993
2   3   Males   481 Humanities & Social Sciences    1993
7   8   Males   404 Natural, Physical & Mathematical Sciences   1993
43  44  Males   1666    Engineering Sciences    1994
32  33  Males   512 Humanities & Social Sciences    1994
35  36  Males   413 Business & Administration   1994
73  74  Males   1715    Engineering Sciences    1995
62  63  Males   497 Humanities & Social Sciences    1995
67  68  Males   460 Natural, Physical & Mathematical Sciences   1995
92  93  Males   497 Humanities & Social Sciences    1996
97  98  Males   449 Natural, Physical & Mathematical Sciences   1996
95  96  Males   358 Business & Administration   1996
17  18  Females 1173    Humanities & Social Sciences    1993
20  21  Females 708 Business & Administration   1993
22  23  Females 588 Natural, Physical & Mathematical Sciences   1993
47  48  Females 1133    Humanities & Social Sciences    1994
50  51  Females 733 Business & Administration   1994
52  53  Females 566 Natural, Physical & Mathematical Sciences   1994
77  78  Females 1240    Humanities & Social Sciences    1995
80  81  Females 788 Business & Administration   1995
82  83  Females 572 Natural, Physical & Mathematical Sciences   1995

You can then format output as required from the result dataframe.