Search code examples
sqlsummaxapache-drill

SQL to find max of sum of data in one table, with extra columns


Apologies if this has been asked elsewhere. I have been looking on Stackoverflow all day and haven't found an answer yet. I am struggling to write the query to find the highest month's sales for each state from this example data.

The data looks like this:

| order_id  | month  | cust_id  | state  | prod_id  | order_total  |
+-----------+--------+----------+--------+----------+--------------+
| 67212     | June   | 10001    | ca     | 909      | 13           |
| 69090     | June   | 10011    | fl     | 44       | 76           |
... etc ...

My query

SELECT `month`, `state`, SUM(order_total) AS sales 
FROM orders GROUP BY `month`, `state` 
ORDER BY sales;

|   month    | state  | sales  |
+------------+--------+--------+
| September  | wy     | 435    |
| January    | wy     | 631    |
... etc ...

returns a few hundred rows: the sum of sales for each month for each state. I want it to only return the month with the highest sum of sales, but for each state. It might be a different month for different states.

This query

SELECT `state`, MAX(order_sum) as topmonth
FROM (SELECT `state`, SUM(order_total) order_sum FROM orders GROUP BY `month`,`state`) 
GROUP BY `state`;

| state  | topmonth  |
+--------+-----------+
| ca     | 119586    |
| ga     | 30140     |

returns the correct number of rows with the correct data. BUT I would also like the query to give me the month column. Whatever I try with GROUP BY, I cannot find a way to limit the results to one record per state. I have tried PartitionBy without success, and have also tried unsuccessfully to do a join.

TL;DR: one query gives me the correct columns but too many rows; the other query gives me the correct number of rows (and the correct data) but insufficient columns.

Any suggestions to make this work would be most gratefully received.

I am using Apache Drill, which is apparently ANSI-SQL compliant. Hopefully that doesn't make much difference - I am assuming that the solution would be similar across all SQL engines.


Solution

  • This one should do the trick

    SELECT  t1.`month`, t1.`state`, t1.`sales`
    FROM    (
                /* this one selects month, state and sales*/
                SELECT  `month`, `state`, SUM(order_total) AS sales 
                FROM    orders
                GROUP BY `month`, `state` 
            ) AS t1
    JOIN    (
                /* this one selects the best value for each state */
                SELECT  `state`, MAX(sales) AS best_month
                FROM    (
                            SELECT  `month`, `state`, SUM(order_total) AS sales 
                            FROM    orders
                            GROUP BY `month`, `state` 
                        )
                GROUP BY `state`
            ) AS t2
    ON      t1.`state` = t2.`state` AND
            t1.`sales` = t2.`best_month`
    

    It's basically the combination of the two queries you wrote.