Search code examples
sqlsql-servert-sqlrankdate-range

Group Rows Over Multiple Date Ranges


I'm trying to solve what I believe to be (at least) two problems at once with this one, so there may already exist answers that partially answer this question, but I do not fundamentally understand the concept of what I'm trying to do with the data to get it into the final form that I desire.

The problem: I have three grains of data (let's just call them Gas, Liquid, Solid) that occur over a larger period of time, let's call that the Observation Period. The ask is for GasPeriod data to be displayed for the observation period if it is present, then Liquid, then Solid. For a given state, there would only ever be 0 or 1 active records.

Gas  X----X       X----X    X---------X 1
Liq  X-------X    X--------XX---------X 2
Sol  X--------------XX----------------X 3

Need 1----12-23--31----12--21---------1

What I need is to reduce those 8 ranges (3 GasPeriod, 3 LiquidPeriod, 2 SolidPeriod) into 6 rows with 6 date ranges and the data from the "winning" row to persist the PeriodTemp and Description for the given overlapping period of time.

Any solutions would be helpful, but I'd greatly appreciate any breakdowns of the actual problem here as well so I can educate myself on what I'm doing. I suspect the problem steps involved are:

  • Return 8 rows for the Observation Period (must involve a union on GasPeriod, LiquidPeriod and SolidPeriod?)
  • Group the rows by date range and rank them (I've seen solutions that involved LAG and DENSE_RANK but I'm not familiar with the latter and I've only used LAG to solve a different problem so it's confusing me a bit how it works.
  • Pick the winning record for the given date range (I assume this is solved using a sub-query)

Edited for clarity

create table ObservationPeriod (
    ObservationPeriodId    BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table GasPeriod (
    GasPeriodId            BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table LiquidPeriod (
    LiquidPeriodId         BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table SolidPeriod (
    SolidPeriodId          BIGINT    IDENTITY (1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

create table ObservationPeriodObserved (
    ObservationPeriodObservedId BIGINT IDENTITY(1,1) NOT NULL,
    BusinessKey            BIGINT    NOT NULL,
    PeriodTemp             DECIMAL (11, 5) NOT NULL,
    Description            NVARCHAR(100) NOT NULL,
    Effective              DATETIME2 NOT NULL,
    Expiry                 DATETIME2 NOT NULL
)

ObservationPeriod Data

ObservationPeriodId BusinessKey Effective Expiry
1 24 2021-01-01 2021-12-31

GasPeriod Data

GasPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2020-09-30 2021-03-31
2 24 102.456 second g 2021-06-01 2021-07-31
3 24 100.011 third g 2021-09-01 9999-12-31

LiquidPeriod Data

LiquidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 98.99 first l 2021-01-01 2021-04-30
2 24 98.76 second l 2021-06-01 2021-08-31
3 24 99.978 third l 2021-09-01 9999-12-31

SolidPeriod Data

SolidPeriodId BusinessKey PeriodTemp Description Effective Expiry
1 24 -0.145 first s 2021-01-01 2021-06-30
2 24 -0.987 second s 2021-07-01 9999-12-31

ObvservationPeriodObserved Data

ObvservationPeriodObservedIdId BusinessKey PeriodTemp Description Effective Expiry
1 24 101.328 first g 2021-01-01 2021-03-31
2 24 98.99 first l 2021-04-01 2021-04-30
3 24 -0.145 first s 2021-05-01 2021-05-31
4 24 102.456 second g 2021-06-01 2021-07-31
5 24 98.76 second l 2021-08-01 2021-08-31
6 24 100.011 third g 2021-09-01 2021-12-31

The idea is that for a given row in ObservationPeriod, there are many associated periods of time between the three aforementioned grains, but only one should ever be recorded as a subset of an ObservationPeriod for a given period of time.

Please also assume that there must be separate granularity here and that this problem cannot be solved away by getting this data into the same table -- it cannot. I can't use the actual business model here so I'm trying to get as close as I can conceptually.


Solution

  • The following approach starts by creating a union of the three datasets Gas, Liquid and Solid. In this union an addition column is created PeriodPriority which will assist in choosing the winning row. I've interpreted the winning row as a period entry occurring within the observational period, is the most recent and has not expired and will be chosen based on the rank of Gas-1, Liquid-2 and Solid-3. This forms the basis of the DENSE_RANK window function as it is ordered by recent expired date and the PeriodPriority. Since this winning record may have a date exceeding the observational period, I used a case expression to ensure that the value inserted was within the observational period.

    Although there is only one Observational period I have still included the where clause WHERE op.ObservationPeriodId=1 which you may update/remove as desired. I also joined on BusinessKey as I am not sure whether this will change throughout your entire set. If BusinessKey never changes, then this may be omitted from the join expression.

    The resulting query currently looks like this

    SELECT 
        ROW_NUMBER() OVER (ORDER BY c.Effective, c.Expiry ) as ObservationPeriodObservedId,
        c.BusinessKey,
        c.PeriodTemp, 
        c.Description,
        c.Effective,
        CASE 
            WHEN c.Expiry >= op.Expiry THEN op.Expiry
            ELSE c.Expiry
        END as Expiry
    FROM (
        SELECT 
            *,
            DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
        FROM ( 
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
            FROM GasPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
            FROM LiquidPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
            FROM SolidPeriod
        ) t
    ) c
    INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                       op.Effective <= c.Effective AND
                                       (op.Expiry >= c.Expiry OR rk=1)
    WHERE op.ObservationPeriodId=1
    ORDER BY c.Effective, c.Expiry
    

    and the insert statement

    INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
    SELECT 
        c.BusinessKey,
        c.PeriodTemp, 
        c.Description,
        c.Effective,
        CASE 
            WHEN c.Expiry >= op.Expiry THEN op.Expiry
            ELSE c.Expiry
        END as Expiry
    FROM (
        SELECT 
            *,
            DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
        FROM ( 
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
            FROM GasPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
            FROM LiquidPeriod
            UNION ALL
            SELECT 
                BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
            FROM SolidPeriod
        ) t
    ) c
    INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                       op.Effective <= c.Effective AND
                                       (op.Expiry >= c.Expiry OR rk=1)
    WHERE op.ObservationPeriodId=1
    ORDER BY c.Effective, c.Expiry
    

    generates the desired results.

    View working db fiddle here

    Edit 1 - Consecutive Effective, Expiry Dates

    As per the updated question and comment, I have modified the above to utilize LAG along with DATE_ADD to provide consecutive dates. The Insert query (latter part is a SELECT) is included below along with an updated db fiddle which provides the desired results. The exception here is record 1 of SolidPeriod that had an end date of 2021-06-30. When this date was changed to 2021-05-31 as in your desired results, the query corrected the 1 date that was not as expected. If there are additional considerations here or there was an error in the sample data, please let me know. I made the adjustment to the sample data instead of an adhoc calculation as I could not assume some logic to make a change as such (i.e. on an arbitrary record subtract 1 month). Let me know whether this works for you and advise further.

    INSERT INTO ObservationPeriodObserved (BusinessKey, PeriodTemp, Description, Effective, Expiry)
    
        SELECT 
            c.BusinessKey,
            c.PeriodTemp, 
            c.Description,
            CASE
                WHEN LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry) IS NULL THEN op.Effective
                ELSE DATEADD(DAY,1,LAG(c.Expiry) OVER (ORDER BY c.Effective, c.Expiry))
            END as Effective,
            CASE 
                WHEN c.Expiry >= op.Expiry THEN op.Expiry
                ELSE c.Expiry
            END as Expiry
            
        FROM (
            SELECT 
                *,
                DENSE_RANK() OVER (ORDER BY t.Expiry DESC, t.PeriodPriority ) rk
            FROM ( 
                SELECT 
                    BusinessKey,PeriodTemp, Description,Effective,Expiry, 1 as PeriodPriority 
                FROM GasPeriod
                UNION ALL
                SELECT 
                    BusinessKey,PeriodTemp, Description,Effective,Expiry, 2 
                FROM LiquidPeriod
                UNION ALL
                SELECT 
                    BusinessKey,PeriodTemp, Description,Effective,Expiry, 3 
                FROM SolidPeriod
            ) t
        ) c
        INNER JOIN ObservationPeriod op ON c.BusinessKey=op.BusinessKey AND
                                           op.Effective <= c.Effective AND
                                           (op.Expiry >= c.Expiry OR rk=1)
        WHERE op.ObservationPeriodId=1
        ORDER BY c.Effective, c.Expiry 
    

    View Demo DB Fiddle

    Let me know if this works for you.