Search code examples
sqlsqlitegroup-bycountwindow-functions

How to input the output of one filtered query into another groupby query in SQL in Python (subquery)?


Here is the full table: (code at end)

query = """
SELECT * FROM businesses
"""
df = pd.read_sql_query(query, conn)
df

businesses table

I would like a single SQL query which finds the top 2 popular cities where owners have at least 2 shops. With expected output being London and Leicester:

tables

I currently know how to group by city to find top 2 cities:

# Top 2 popular cities
query1 = """
SELECT city, COUNT(*) as frequency
FROM businesses
GROUP BY city
ORDER BY frequency DESC
LIMIT 2
"""
df = pd.read_sql_query(query1, conn)
df

And how to filter to find owners who have at least 2 shops:

# Owners who have at least 2 shops
query2 = """
SELECT owner, COUNT(*) AS count
FROM businesses
GROUP BY owner
HAVING count >= 2
ORDER BY count DESC
"""
df = pd.read_sql_query(query2, conn)
df

But I don't know how to combine query1 and query 2 into one query.

I would really appreciate any help with this, trying to self-learn SQL :)

Here is the code which creates the database if you'd like to follow along:

# import libraries
import pandas as pd
import sqlite3

# create database
conn = sqlite3.connect("my_db.db")

# create table
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS businesses;")
query = """
CREATE TABLE IF NOT EXISTS businesses (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    city TEXT NOT NULL,
    owner TEXT NOT NULL
    )
"""
cur.execute(query)
conn.commit()

# add rows to table
query = """
INSERT INTO businesses
    (id, name, city, owner)
VALUES
    (1, "Shop A", "London", "Tom"),
    (2, "Shop B", "London", "Tom"),
    (3, "Shop C", "London", "Tom"),
    (4, "Shop D", "Luton", "Alice"),
    (5, "Shop E", "Leeds", "Jenny"),
    (6, "Shop F", "Leicester", "James"),
    (7, "Shop G", "Leicester", "James"),
    (8, "Shop H", "Leicester", "Emma"),
    (9, "Shop I", "Leicester", "Emma"),
    (10, "Shop J", "Liverpool", "James"),
    (11, "Shop K", "Liverpool", "James"),
    (12, "Shop L", "Liverpool", "George"),
    (13, "Shop M", "Shefield", "Mary"),
    (14, "Shop N", "Shefield", "Mary"),
    (15, "Shop O", "Cambridge", "Oliver"),
    (16, "Shop P", "Manchester", "Harry")
"""
cur.execute(query)
conn.commit()

Solution

  • You should group by city and owner, filter the results in the HAVING clause and then use SUM() window function to get the count for each city:

    SELECT DISTINCT city, 
           SUM(COUNT(*)) OVER (PARTITION By city) AS count
    FROM businesses
    GROUP BY city, owner
    HAVING COUNT(*) >= 2
    ORDER BY count DESC LIMIT 2;
    

    See the demo.