Search code examples
sqlpostgresqljoinaggregation

SQL query to find the total number of books sold for each day by merging 2 tables?


I'm trying to find total books sold (sum of quantities) for each product_id and txn_day. I have 2 tables, transactions and catalog as below:

Table 1: transaction

market_id, txn_day, customer_id, product_id, quantity
1,2019-03-01,1,B0002,1
1,2019-03-01,2,B0003,1
1,2019-03-01,1,B0001,1
3,2019-03-01,3,B0001,1
3,2019-03-01,4,B0002,1
4,2019-03-01,1,B0002,1
4,2019-03-01,5,B0001,1
4,2019-03-01,6,B0001,1

Table 2: catalog

market_id, product_id, title_name
1,B0001, Harry Potter 1
1,B0002, Harry Potter 2
1,B0003, Harry Potter 3
3,B0001, Harry Potter 1
3,B0002, Harry Potter 2
3,B0003, Harry Potter 3
4,B0001, Harry Potter 1
4,B0002, Harry Potter 2
4,B0003, Harry Potter 3

I wrote the following query and got the total books sold (sum of quantities) for product_id:

SELECT 
    transaction.txn_day, transaction.product_id,
    SUM(quantity) AS quantity 
FROM 
    transaction
GROUP BY
    transaction.product_id, transaction.txn_day;

I tried following query to get title_name for each product_id, but it seems incorrect.

SELECT 
    transaction.txn_day, transaction.product_id, catalog.title_name,
    SUM(quantity) AS quantity 
FROM 
    catalog
INNER JOIN 
    transaction ON catalog.product_id = transaction.product_id
GROUP BY
    transaction.txn_day, transaction.product_id, catalog.title_name;

I'm getting the following result:

|txn_day    |product_id |title_name     |quantity
|2019-03-01 |B0002      |Harry Potter 2 |9
|2019-03-01 |B0001      |Harry Potter 1 |12
|2019-03-01 |B0003      |Harry Potter 3 |3

I'm expecting the result to be something like:

|txn_day    | product_id | quantity | title_name
|2019-03-01 | B0003      | 1        | Harry Potter 3
|2019-03-01 | B0002      | 3        | Harry Potter 2
|2019-03-01 | B0001      | 4        | Harry Potter 1

Please suggest any changes required to the query.


Solution

  • I loaded your data and got the following results:

    DATE        ID      TITLE           QTY
    3/1/2019    B0002   Harry Potter 2  9
    3/1/2019    B0003   Harry Potter 3  3
    3/1/2019    B0001   Harry Potter 1  12
    

    Your clue here as to what is wrong is that your Qty is 3x too much for each record.

    If you look at your INNER JOIN you are only joining by the product_id.

    Take a look at your transaction rows.

    You can see that each product_id exists along with 3 different market_ids, that is why your qty is tripling.

    Solution: Add the market_id to the JOIN:

    SELECT   t.txn_day
            ,t.product_id
            ,c.title_name
            ,SUM(quantity) AS quantity
    FROM     catalog c
             INNER JOIN transactions t ON c.product_id = t.product_id AND c.market_id = t.market_id
    GROUP BY t.txn_day, t.product_id, c.title_name
    order by c.title_name;
    

    You will get your results:

    3/1/2019    B0001   Harry Potter 1  4
    3/1/2019    B0002   Harry Potter 2  3
    3/1/2019    B0003   Harry Potter 3  1