Search code examples
mysqlgroup

Why use group by 1, 2?


Here is the problem from leetcode.

Table: Sales

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| sale_id      | int     |
| product_name | varchar |
| sale_date    | date    |
+--------------+---------+
sale_id is the primary key for this table.
Each row of this table contains the product name and the date it was sold.
 

Since table Sales was filled manually in the year 2000, product_name may contain leading and/or trailing white spaces, also they are case-insensitive.

Write an SQL query to report

product_name in lowercase without leading or trailing white spaces.
sale_date in the format ('YYYY-MM').
total the number of times the product was sold in this month.
Return the result table ordered by product_name in ascending order. In case of a tie, order it by sale_date in ascending order.

Most people wrote this in mysql:

select lower(trim(product_name)) as product_name, date_format (sale_date, '%Y-%m') as sale_date, count(*) as total
from sales
group by 1, 2
order by 1, 2;

I still don't get why 'group by 1, 2' has to be used instead of 'group by product_name, sale_date'. In mysql, group by is executed after select, right? The alias should be considered from the trim() etc, right?

Can someone help me clarify this? Thanks!


Solution

  • Using the same thing as a select alias and a column name is likely to get you into trouble.

    The sale_date returned by the select is just a month; the sale_date column presumably has multiple potential values per month. If you do:

    select lower(trim(product_name)) as product_name, date_format (sale_date, '%Y-%m') as sale_date, count(*) as total
    from sales
    group by product_name, sale_date
    order by 1, 2;
    

    it is ambiguous as to whether you mean to group by sale_date the column or sale_date the alias. mysql will assume you mean the column, which will not aggregate your results by month.

    Either explicitly repeat the select expression:

    group by lower(trim(product_name)), date_format(sale_date, '%Y-%m')
    

    or use the shorthand that allows you to reference select columns:

    group by 1, 2
    

    Or distinguish between what you select and what is stored:

    select lower(trim(product_name)) as display_product_name, date_format (sale_date, '%Y-%m') as sale_month, count(*) as total
    from sales
    group by display_product_name, sale_month