Search code examples
sqlhadoophivehiveqlimpala

Min time stamp partition by same value in column


I have a below dataset where I'm trying to get the min timestamp of the same values in a single column.

Here is my dataset.

enter image description here

I am trying to create a column to find user_first_comment for each interaction with dealer.

like below

is there any technique that i can use to add a incremental value for same values in columns. for example column RANK_Incremental,so that i can use Min wit hpartition to get final output.

enter image description here


Solution

  • With common table expressions (or derived tables) and window function support, try the following:

    WITH step1 AS (
            SELECT *, CASE WHEN LAG(note_type) OVER (PARTITION BY incident ORDER BY note_date) = note_type THEN 0 ELSE 1 END AS edge FROM incidents
         )
       , step2 AS (
            SELECT *, SUM(edge) OVER (PARTITION BY incident ORDER BY note_date) AS grp FROM step1
         )
    SELECT *, FIRST_VALUE(note_date) OVER (PARTITION BY incident, grp ORDER BY note_date) AS first_date FROM step2
    ;
    

    The result:

    +----------+---------------------+----------------+------+------+---------------------+
    | incident | note_date           | note_type      | edge | grp  | first_date          |
    +----------+---------------------+----------------+------+------+---------------------+
    |  5498091 | 2021-12-15 17:20:00 | USER_COMMENT   |    1 |    1 | 2021-12-15 17:20:00 |
    |  5498091 | 2021-12-15 17:21:00 | USER_COMMENT   |    0 |    1 | 2021-12-15 17:20:00 |
    |  5498091 | 2021-12-15 17:55:00 | DEALER_COMMENT |    1 |    2 | 2021-12-15 17:55:00 |
    |  5498091 | 2021-12-15 17:59:00 | USER_COMMENT   |    1 |    3 | 2021-12-15 17:59:00 |
    |  5498091 | 2021-12-16 11:02:00 | USER_COMMENT   |    0 |    3 | 2021-12-15 17:59:00 |
    |  5498091 | 2021-12-16 16:46:00 | DEALER_COMMENT |    1 |    4 | 2021-12-16 16:46:00 |
    +----------+---------------------+----------------+------+------+---------------------+
    

    The setup:

    CREATE TABLE incidents (
       incident         int
     , note_date        datetime
     , note_type        varchar(20)
    )
    ;
    
    INSERT INTO incidents VALUES
       (5498091, '2021-12-15 17:20', 'USER_COMMENT')
     , (5498091, '2021-12-15 17:21', 'USER_COMMENT')
     , (5498091, '2021-12-15 17:55', 'DEALER_COMMENT')
     , (5498091, '2021-12-15 17:59', 'USER_COMMENT')
     , (5498091, '2021-12-16 11:02', 'USER_COMMENT')
     , (5498091, '2021-12-16 16:46', 'DEALER_COMMENT')
    ;