Search code examples
sqlsql-serverdatabasegroup-bynorthwind

Get most sold product for each country from NORTHWIND database


Good day guys, I've been struggling with this for the past day and I just can't seem to figure it out.

My task is to derive the most sold product for each country from the popular open source database called NORTHWIND: https://northwinddatabase.codeplex.com

I was able to get to this stage, here is my code in SQL Server:

--Get most sold product for each country
WITH TotalProductsSold AS 
(
    SELECT od.ProductID, SUM(od.Quantity) AS TotalSold
        FROM [Order Details] AS od
        GROUP BY od.ProductID
)
SELECT MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity, s.Country --,p.ProductName
    FROM Products AS p
    INNER JOIN TotalProductsSold
    ON TotalProductsSold.ProductID = p.ProductID
    INNER JOIN Suppliers AS s
    ON s.SupplierID = p.SupplierID
    GROUP BY s.Country
    ORDER BY MostSoldQuantity DESC

This gives me the following result:

enter image description here

That's all good but I wish to find out the product name for the MostSoldQuantity.

Thank you very much !

P.S I put a comment --p.ProductName where I thought it would work but it didnt and if someone could explain me why does GROUP BY not automatically allow me to derive the product name for the row that would be great


Solution

  • First, start with the count of products sold, per country, not just per product. Then rank them and pick only anything at RANK = 1. Something like...

    WITH
        ProductQuantityByCountry AS 
    (
        SELECT
           s.CountryID,
           p.ProductID,
           SUM(od.Quantity)   AS Quantity
        FROM
            [Order Details]   AS od
        INNER JOIN
            Products          AS p
                ON  p.ProductID = od.ProductID
        INNER JOIN
            Suppliers         AS s
                ON  s.SupplierID = p.SupplierID
        GROUP BY
           s.CountryID,
           p.ProductID
    ),
        RankedProductQuantityByCountry
    AS
    (
        SELECT
            RANK() OVER (PARTITION BY CountryID ORDER BY Quantity DESC)  AS countryRank,
            *
        FROM
            ProductQuantityByCountry
    )
    SELECT
        *
    FROM
        RankedProductQuantityByCountry
    WHERE
        countryRank = 1
    

    Note, one country may supply identical quantity of different producs, and so two products could both have rank = 1. Look into ROW_NUMER() and/or DENSE_RANK() for other but similar behaviours to RANK().

    EDIT: A simple though exercise to cover why SQL doesn't let you put Product.Name in your final query is to ask a question.

    What should SQL do in this case?

    SELECT
        MAX(TotalProductsSold.TotalSold) AS MostSoldQuantity,
        MIN(TotalProductsSold.TotalSold) AS LeastSoldQuantity,
        s.Country,
        p.ProductName
    FROM
        blahblahblah
    GROUP BY
        s.Country
    ORDER BY
        MostSoldQuantity DESC
    

    The presence of a MIN and a MAX makes things ambiguous.

    You may be clear that you want to perform an operation by country and that operation to be to pick the product with the highest sales volume from that country. But it's not actually explicit, and small changes to the query could have very confusing consequences to any inferred behaviour. Instead SQL's declarative syntax provides a very clear / explicit / deterministic description of the problem to be solved.

    If an expression isn't mentioned in the GROUP BY clause, you can't SELECT it, without aggregating it. This is so that there is no ambiguity as to what is meant or what the SQL engine is supposed to do.

    By requiring you to stipulate get the total sales per country per product at one level of the query, you can then cleanly state and then pick the highest ranked per country at another level of the query.

    This can feel like you end up with queries that are longer than "should" be necessary. But it also results in queries that are completely un-ambiguous, both for compiling the query down to an execution plan, and for other coders who will read your code in the future.