Search code examples
sql-servert-sqlintervalsaggregationschedule

Merging/splitting overlapping date ranges with priority


I have three tables. One table tells me when a particular vendor is contracted. A second tells me the base fee schedule that we contract with all vendors. And a third tells me if a specific contract has a different contracted rate for one of the fees. The tables look like this:

CREATE TABLE [dbo].[Facility](
    [FacilityID] [bigint] IDENTITY(1,1) NOT NULL,
    [ProviderID] [varchar](50) NOT NULL,
    [VendorID] [bigint] NOT NULL,
    [FacilityName] [varchar](300) NOT NULL,
    [FacilityAddress1] [varchar](300) NOT NULL,
    [FacilityAddress2] [varchar](300) NOT NULL,
    [FacilityCity] [varchar](300) NOT NULL,
    [FacilityState] [char](2) NOT NULL,
    [FacilityZip] [varchar](10) NOT NULL,
    [ContractEffectiveDate] [date] NOT NULL,
    [ContractTermDate] [date] NOT NULL,
 CONSTRAINT [PK_Facility] PRIMARY KEY CLUSTERED 
(
    [FacilityID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[BaseFeeSchedule](
    [BaseFeeScheduleID] [int] IDENTITY(1,1) NOT NULL,
    [FeeCode] [varchar](10) NOT NULL,
    [Description] [varchar](800) NOT NULL,
    [Rate] [money] NOT NULL,
    [CategoryID] [int] NOT NULL,
    [RateEffectiveDate] [date] NOT NULL,
    [RateTermDate] [date] NOT NULL,
 CONSTRAINT [PK_BaseFeeSchedule] PRIMARY KEY CLUSTERED 
(
    [BaseFeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[OverrideFeeSchedule](
    [OverrideFeeScheduleID] [bigint] IDENTITY(1,1) NOT NULL,
    [FacilityID] [bigint] NOT NULL,
    [FeeCode] [varchar](10) NOT NULL,
    [OverrideRate] [money] NOT NULL,
    [RateEffectiveDate] [date] NOT NULL,
    [RateTermDate] [date] NOT NULL,
 CONSTRAINT [PK_OverrideFeeSchedule] PRIMARY KEY CLUSTERED 
(
    [OverrideFeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[OverrideFeeSchedule]  WITH CHECK ADD  CONSTRAINT [FK_OverrideFeeSchedule_Facility] FOREIGN KEY([FacilityID])
REFERENCES [dbo].[Facility] ([FacilityID])
GO

ALTER TABLE [dbo].[OverrideFeeSchedule] CHECK CONSTRAINT [FK_OverrideFeeSchedule_Facility]
GO

We have an existing system with one table that looks like this:

CREATE TABLE [dbo].[FeeSchedule](
    [FeeScheduleID] [int] IDENTITY(1,1) NOT NULL,
    [VendorID] [int] NULL,
    [FeeCd] [varchar](10) NOT NULL,
    [StartDate] [date] NOT NULL,
    [EndDate] [date] NOT NULL,
    [ContractedAmount] [money] NOT NULL,
    [ProgramTypeID] [int] NULL,
 CONSTRAINT [PK_FeeSchedule] PRIMARY KEY CLUSTERED 
(
    [FeeScheduleID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

That table is used in code to determine the correct rate to pay each vendor. It is my job to keep that table updated, but it is proving problematic because different facilities are contracted on different dates. Every contract includes the base fee schedule. However, the contracts allow for certain fees to get "overridden" with different fees (frequently lower than the normal contracted fees when there are discounts, occasionally higher when there are surcharges that need to get added in). These three tables are ones that I built that store all of the current data, and I have been using them to build the FeeSchedule table that is needed by the software. It is easy enough to process changes, but I have been tasked with verifying that the data in the FeeSchedule table is accurate.

The FeeSchedule table includes not only new data (which is the only thing I change), but also prior data. So, the plan is to take the data in the three tables, run a query to merge date ranges (where a fee in the OverrideFeeSchedule table takes priority over a fee in the BaseFeeSchedule table)

An example of this:

INSERT INTO Facility(VendorID,ContractEffectiveDate,ContractTermDate,...) 
VALUES(1,'1/1/2017','12/31/9999',...) --Assume FacilityID=1

INSERT INTO BaseFeeSchedule(FeeCode,Rate,RateEffectiveDate,RateTermDate,...) 
VALUES('1',100,'1/1/2015','10/15/2016',...),
('1',120,'10/16/2016','4/5/2018',...),
('1',140,'4/6/2018','12/31/9999',...)

INSERT INTO OverrideFeeSchedule(FacilityID,FeeCode,OverrideRate,RateEffectiveDate,RateTermDate,...) 
VALUES(1,'1',50,'3/1/2017','5/31/2018',...),
(1,'1',70,'7/1/2018','12/31/9999',...)

And from this data, I would want:

INSERT INTO FeeSchedule(VendorID, FeeCd, StartDate,EndDate,ContractedAmount)
VALUES(1,'1','1/1/2017','2/28/2017',120), --From BaseFeeSchedule
(1,'1','3/1/2017','5/31/2018',50), --From OverrideFeeSchedule
(1,'1','6/1/2018','6/30/2018',140), --From BaseFeeSchedule
(1,'1','7/1/2018','12/31/9999',70) --From OverrideFeeSchedule

I have verified that there is no data for a single Facility/FeeCode combination in the OverrideFeeSchedule table that overlaps, and there is no data for a single FeeCode in the BaseFeeSchedule that overlaps. My current solution takes forever. I am doing the following:

Build a table of every day since the first contracted Facility onward. (BigTable is just a table with about a million records, and I only take enough so that I get every day between the first date a vendor was contracted all the way to a year from now. However, since the max recursion is somewhere around 20,000, when the range from the first contracted vendor to a year from today is more than 20,000 days, I may get errors with this for violating the max recursion. So, I am hoping for a different solution.

SELECT DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY A.TableID) - 1,B.MinDate) CheckDate 
INTO #DatesToCheck
FROM BigTable A
CROSS JOIN 
    (SELECT MIN(ContractEffectiveDate) MinDate
    FROM Facility) B
WHERE DATEADD(DAY,ROW_NUMBER() OVER (ORDER BY A.TableID) - 1,B.MinDate) < DATEADD(YEAR,1,GETDATE())

Join this table to each other table to build a massive table of every day, every facility that was contracted on that day, every FeeCode that was chargeable on that day, and the specific rate for that day. I will not bother with the code for that join, but it is not difficult to write.

Next, I use the technique described here to merge the date ranges: StackOverflow

While this technique works, it is extremely slow. Is there a more direct method to generate the result set I am seeking? Basically, I am looking for how I might modify the approach in that link to take into account potential overlaps with differing levels of priority (base vs override) like in the example I offered.


Solution

  • I hope I got this correctly...

    First of all, you should implement a numbers / date table. This is not absolutely necessary but very handsome in many cases. You might follow this example...

    With such a list you can try something along this:

    DECLARE @endDate DATE='20191231';
    
    WITH DailyBaseRate AS
    (
        SELECT CoveredDays.CalendarDate
              ,CONCAT('base ',bfs.RateEffectiveDate) AS RateKey
              ,bfs.FeeCode
              ,bfs.Rate
        FROM BaseFeeSchedule bfs
        CROSS APPLY(SELECT * FROM RunningNumbers rn WHERE rn.CalendarDate<=@endDate AND rn.CalendarDate>=bfs.RateEffectiveDate AND rn.CalendarDate<=bfs.RateTermDate) CoveredDays
    
    )
    ,OverrideRates AS
    (
        SELECT CoveredDays.CalendarDate
              ,o.FacilityID 
              ,CONCAT('override ',o.RateEffectiveDate) AS RateKey
              ,o.FeeCode
              ,o.OverrideRate
        FROM OverrideFeeSchedule o
        CROSS APPLY(SELECT * FROM RunningNumbers rn WHERE  rn.CalendarDate<=@endDate AND rn.CalendarDate>=o.RateEffectiveDate AND rn.CalendarDate<=o.RateTermDate) CoveredDays
    ) 
    ,EffectiveRates AS
    (
        SELECT f.*
              ,dbr.CalendarDate
              ,COALESCE(ovr.RateKey, dbr.RateKey) AS EffectiveRateKey
              ,COALESCE(ovr.FeeCode, dbr.FeeCode) AS EffectiveFeeCode
              ,COALESCE(ovr.OverrideRate, dbr.Rate) AS EffectiveRate
        FROM dbo.Facility f
        CROSS JOIN DailyBaseRate dbr
        LEFT JOIN OverrideRates ovr ON ovr.FacilityID=f.FacilityID AND ovr.CalendarDate=dbr.CalendarDate
        WHERE dbr.CalendarDate<=@endDate 
          AND dbr.CalendarDate>=f.ContractEffectiveDate 
          AND dbr.CalendarDate<=f.ContractTermDate
    )
    SELECT FacilityID,FacilityName
          ,EffectiveRateKey,EffectiveFeeCode,EffectiveRate
          ,MIN(CalendarDate) AS FromDate
          ,MAX(CalendarDate) AS ToDate
    FROM EffectiveRates
    GROUP BY FacilityID,FacilityName,EffectiveRateKey,EffectiveFeeCode,EffectiveRate
    ORDER BY FacilityID,FromDate;
    

    The result (I've added a second facility to your test data...)

    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | FacilityID | FacilityName | EffectiveRateKey    | EffectiveFeeCode | EffectiveRate | FromDate   | ToDate     |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 1          | Fac1         | base 2016-10-16     | 1                | 120,00        | 2017-01-01 | 2017-02-28 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 1          | Fac1         | override 2017-03-01 | 1                | 50,00         | 2017-03-01 | 2018-05-31 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 1          | Fac1         | base 2018-04-06     | 1                | 140,00        | 2018-06-01 | 2018-06-30 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 1          | Fac1         | override 2018-07-01 | 1                | 50,00         | 2018-07-01 | 2019-12-31 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 2          | Fac2         | base 2018-04-06     | 1                | 140,00        | 2019-01-01 | 2019-12-31 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    | 2          | Fac2         | override 2019-07-01 | 1                | 99,00         | 2019-07-01 | 2019-08-15 |
    +------------+--------------+---------------------+------------------+---------------+------------+------------+
    

    The idea in short

    • The first CTE will transform your base schedule to a list of days (one row per day with the current code and rate per day)
    • The second CTE will do the same, but uses the override schedule
    • The third CTE will CROSS JOIN your facilities with the base schedule (this might get rather large if there are many facilities) and LEFT JOIN the override rates (no additional rows)
    • The set is filtered to the actually used range
    • At the end we can group by some columns and pick the interval borders with MIN and MAX

    Hint: We need the EffectiveRateKey to avoid a groupig together of different intervals with the same rate and code. As a side effect you can see, from which source the rate was taken.

    Hint2: As we never know in which order the engine will work this down, think about indexes and it might help a lot to use (indexed) temp tables instead of the CTEs...