Search code examples
sqlsql-serverintervalsrecursive-query

Generate multiple record from existing records based on interval columns [from and to]


I have 2 types of score [M,B] in column 3, if a type is M, then the score is either an S[scored] or SB[bonus scored] in column 6. Every interval [from_hrs - to_hrs] for a type B must have a corresponding SB for type M, thus, an interval for a type B cannot have a score of S for a type M. I have several records that were unfortunately captured as seen in the table below.

CREATE TABLE SCORE_TBL
(  
    ID int IDENTITY(1,1)  PRIMARY KEY,
    PERSONID_FK int NOT NULL,
    S_TYPE varchar(50) NULL,
    FROM_HRS int NULL,
    TO_HRS int NULL,
    SCORE varchar(50) NULL,
);



INSERT INTO SCORE_TBL(PERSONID_FK,S_TYPE,FROM_HRS,TO_HRS,SCORE)
     VALUES
    (1, 'M' , 0,20, 'S'),
    (1, 'B',6, 8, 'B'),
    (2, 'B',0, 2, 'B'),
    (2, 'M',0,20, 'S'),
    (2, 'B', 10,13, 'B'),
    (2, 'B', 18,20, 'B'),
    (2, 'M', 13,18, 'S'); 


| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
|  1 |           1 | M    |        0 |     20 | S     |
|  2 |           1 | B    |        6 |      8 | B     |
|  3 |           2 | B    |        0 |      2 | B     |
|  4 |           2 | M    |        0 |     20 | S     |
|  5 |           2 | B    |       10 |     13 | B     |
|  6 |           2 | B    |       18 |     20 | B     |
|  7 |           2 | M    |       13 |     18 | S     |

I want the data to look like this

| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
|  1 |           1 | M    |        0 |      6 | S     |
|  2 |           1 | M    |        6 |      8 | SB    |
|  3 |           1 | B    |        6 |      8 | B     |
|  4 |           1 | M    |        8 |     20 | S     |
|  5 |           2 | B    |        0 |      2 | B     |
|  6 |           2 | M    |        0 |      2 | SB    |
|  7 |           2 | M    |        2 |     10 | S     |
|  8 |           2 | B    |       10 |     13 | B     |
|  9 |           2 | M    |       10 |     13 | SB    |
| 10 |           2 | M    |       13 |     18 | S     |
| 11 |           2 | B    |       18 |     20 | B     |
| 12 |           2 | S    |       18 |     20 | SB    |

Any ideas on how to generate this data in SQL Server select statement? Visually, this what am trying to get.

enter image description here


Solution

  • Tricky part here is that interval might need to be split in several pieces like 0..20 for person 2.

    Window functions to the rescue. This query illustrates what you need to do:

    WITH
      deltas AS (
        SELECT personid_fk, hrs, sum(delta_s) as delta_s, sum(delta_b) as delta_b
          FROM (SELECT personid_fk, from_hrs as hrs,
                       case when score = 'S' then 1 else 0 end as delta_s,
                       case when score = 'B' then 1 else 0 end as delta_b
                  FROM score_tbl
                 UNION ALL
                SELECT personid_fk, to_hrs as hrs,
                       case when score = 'S' then -1 else 0 end as delta_s,
                       case when score = 'B' then -1 else 0 end as delta_b
                  FROM score_tbl) _
         GROUP BY personid_fk, hrs
      ),
      running AS (
        SELECT personid_fk, hrs as from_hrs,
               lead(hrs) over (partition by personid_fk order by hrs) as to_hrs,
               sum(delta_s) over (partition by personid_fk order by hrs) running_s,
               sum(delta_b) over (partition by personid_fk order by hrs) running_b
          FROM deltas
      )
      SELECT personid_fk, 'M' as s_type, from_hrs, to_hrs,
             case when running_b > 0 then 'SB' else 'S' end as score
        FROM running
       WHERE running_s > 0
       UNION ALL
      SELECT personid_fk, s_type, from_hrs, to_hrs, score
        FROM score_tbl
       WHERE s_type = 'B'
       ORDER BY personid_fk, from_hrs;
    

    Step by step:

    • deltas is union of two passes on score_tbl - one for start and one for end of score/bonus interval, creating a timeline of +1/-1 events
    • running calculates running total of deltas over time, yielding split intervals where score/bonus are active
    • final query just converts score codes and unions bonus intervals (which are passed unchanged)

    SQL Fiddle here.