Search code examples
pythonsqlpostgresqlsqlalchemychalice

Write a query (preferably in sqlalchemy) to count number of occurrence of unique values in a column grouped by another column in a postgres table


my project uses AWS Chalice framework and sqlalchemy.

Here is a sample table from the project.

Person Table

I am required to get data from table in two formats. I require two separate queries which should give the data in different format as both will have separate apis. Please do not combine the query to give data in both formats together as dataset is huge & there will be performance issues and as already mentioned I need them separately.

First Format:

I want to get count of person for different jobs grouped by city and if the job field is empty/none, then it should be treated as No Job.

{
   "Bangalore":{
      "Engineer": 2,
      "No Job": 2,
      "Paramedic": 1,
   },
   "Delhi":{
      "Architect": 2,
      "Doctor": 1,
   },
   "Mumbai":{
      "Doctor": 2,
      "Engineer": 1,
      "No Job": 1,
   }
}

Second Format:

I want to get count of unique job titles and person grouped by city.

{
   "Bangalore":{
      "job_titles": 3,
      "persons": 5,
   },
   "Delhi":{
      "job_titles": 2,
      "persons": 3,
   },
   "Mumbai":{
      "job_titles": 3,
      "persons": 4,
   }
}

Explanation:

  • Bangalore has 2 Enginners, 1 Paramedic and 2 person with no job, so there are 3 unique job_titles & 5 Persons.
  • Delhi has 2 Architect and 1 Doctor, so there are 2 unique job_titles & 3 person.
  • Mumbai has 2 Doctors, 1 Enginner and 1 person with no job, so there are 3 unique job_titles & 4 Persons.

Currently, I am achieving this by running a simple Select * from table type query, and iterating over it to form nested dictionaries (in Python). And the issue is that this processing takes a long time.

I am not sure if data in this type of formats even achievable. So, any query whose data can be converted easily and quickly(wrt time) is appreciated.

I have been stuck on this issue for over a month now, so any type of help is appreciated. But please try to provide answer in sqlalchemy query.

Thanks


Solution

  • This is like basics of SQL - count and group by. Looks like a job interview question.

    select city, job, count(*) as count from person_jobs group by city, job;
    

    And in Python:

    cities_jobs = collections.defaultdict(dict)
    for city, job, count in query_result:
      cities_jobs[city][job] = count