Search code examples
sqlgoogle-bigquerysql-order-bypartition-by

SQL to get group number


I have an example data as below.

    +---------+------------+--------+-----------+
    | User Id |   Sequence | Action | Object    |
    |---------|------------|--------|-----------|
    | 12345   |    1       | Eat    | Bread     |
    | 12345   |    2       | Eat    | Steak     |
    | 12345   |    3       | Eat    | Bread     |
    | 12345   |    4       | Drink  | Milk tea  |
    | 12345   |    5       | Drink  | Black tea |  
    | 12345   |    6       | Eat    | Cake      |
    | 12345   |    7       | Eat    | Candy     |
    | 12345   |    8       | Drink  | Black tea | 
    | 12345   |    9       | Drink  | Green tea | 
    | 12345   |    10      | Drink  | Water     |
    +---------+------------+--------+-----------+

Now I'd like adding one column called 'Group Id' in the table, the result should be like this:

    +---------+------------+--------+-----------+-----------+
    | User Id |   Sequence | Action | Object    | Group Id. |
    |---------|------------|--------|-----------|-----------|
    | 12345   |    1       | Eat    | Bread     |     1     |
    | 12345   |    2       | Eat    | Steak     |     1     |
    | 12345   |    3       | Eat    | Bread     |     1     |
    | 12345   |    4       | Drink  | Milk tea  |     2     |
    | 12345   |    5       | Drink  | Black tea |     2     |
    | 12345   |    6       | Eat    | Cake      |     3     |
    | 12345   |    7       | Eat    | Candy     |     3     |
    | 12345   |    8       | Drink  | Black tea |     4     |
    | 12345   |    9       | Drink  | Green tea |     4     |
    | 12345   |    10      | Drink  | Water     |     4     |
    +---------+------------+--------+-----------+-----------|

The same action should be divided into a group, but will be separated by a different order. How can I implement SQL(I use Google Bigquery)?

Thanks a million!


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT * EXCEPT(new_group),
      COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
    FROM (
      SELECT *,
        Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
      FROM `project.dataset.table`
    )
    -- ORDER BY User_Id     
    

    If to apply to sample data from your question - the output is

    Row User_Id Sequence    Action  Object      Group_Id     
    1   12345   1           Eat     Bread       1    
    2   12345   2           Eat     Steak       1    
    3   12345   3           Eat     Bread       1    
    4   12345   4           Drink   Milk tea    2    
    5   12345   5           Drink   Black tea   2    
    6   12345   6           Eat     Cake        3    
    7   12345   7           Eat     Candy       3    
    8   12345   8           Drink   Black tea   4    
    9   12345   9           Drink   Green tea   4    
    10  12345   10          Drink   Water       4