Search code examples
sqlsql-servert-sqlrow-number

SQL Server: assign the same row number to rows until a condition is met


I have the following dataset:

    ukey          id          code                    create_date       
     1           1082         9053             2018-03-01 23:18:51.0000000
     2           1082         9035             2018-03-01 23:19:21.0000000
     3           1082         9053             2018-03-01 23:22:55.0000000
     4           1082         9535             2018-03-01 23:23:30.0000000
     5           1196         3145             2018-03-05 07:27:15.0000000
     6           1196         3162             2018-03-05 07:27:50.0000000
     7           1196         3175             2018-03-05 07:28:24.0000000
     8           1196         3235             2018-03-05 07:28:57.0000000
     9           1196         3295             2018-03-05 07:29:31.0000000
     10          1196         3448             2018-03-05 07:30:04.0000000
     11          1196         3465             2018-03-05 07:30:37.0000000
     12          1196         4265             2018-03-05 07:31:09.0000000
     13          1196         3495             2018-03-05 17:13:13.0000000
     14           473          551             2018-03-02 16:43:52.0000000
     15           473          590             2018-03-02 16:44:30.0000000
     16           473          835             2018-03-02 16:45:02.0000000

I want another column in this dataset, say RN. RN column should have the same value until these two conditions are met:

  • If the id is different or,
  • If the datediff between current and previous row is more than 60 seconds

So the final dataset should look like this:

    ukey          id          code                    create_date                RN
     1           1082         9053             2018-03-01 23:18:51.0000000        1
     2           1082         9035             2018-03-01 23:19:21.0000000        1
     3           1082         9053             2018-03-01 23:22:55.0000000        1
     4           1082         9535             2018-03-01 23:23:30.0000000        1
     5           1196         3145             2018-03-05 07:27:15.0000000        2
     6           1196         3162             2018-03-05 07:27:50.0000000        2
     7           1196         3175             2018-03-05 07:28:24.0000000        2
     8           1196         3235             2018-03-05 07:28:57.0000000        2
     9           1196         3295             2018-03-05 07:29:31.0000000        2
     10          1196         3448             2018-03-05 07:30:04.0000000        2
     11          1196         3465             2018-03-05 07:30:37.0000000        2
     12          1196         4265             2018-03-05 07:31:09.0000000        2
     13          1196         3495             2018-03-05 17:13:13.0000000        3
     14           473          551             2018-03-02 16:43:52.0000000        4
     15           473          590             2018-03-02 16:44:30.0000000        4
     16           473          835             2018-03-02 16:45:02.0000000        4

Notice the change in RN from 2 to 3. The id is same but the datediff is more than 60 secs.

So far I have tried this query. My original dataset is in dbo.myTable:

select 
        a.id as A_ID,
        a.code as A_Code,
        a.create_date as A_CreateDate,
        a.RN as A_RN,
        _.id as _ID,
        _.code as _Code,
        _.create_date as _CreateDate,
        _.RN as _RN
    from myTable a
    cross apply
    (
        select * from myTable b
        where a.id=b.id and (a.ukey=b.ukey-1 or a.ukey=1) and datediff(ss,a.create_date,b.create_date)<60
    )_
    order by a.id, a.create_date

This query is NOT giving me the expected result. My thought was to compare each row with the previous row and check for the two conditions.


Solution

  • This sounds like you want lag() and a cumulative sum:

    select t.*,
           sum(case when prev_create_date > dateadd(second, -60, create_date)
                    then 0 else 1
               end) over
               (order by ukey) as rn
    from (select t.*,
                 lag(create_date) over (partition by id order by create_date) as prev_create_date
          from t
         ) t;