Search code examples
sqlanalyticsteradatadata-warehousewindow-functions

Snapshot of table row instance with time intervals


I am not sure what to call this phenomenon. I have just titles it with "Snapshot of table row instance with time intervals". We have a situation where we need to create a new entity which is kind of JOIN of two tables but these two tables have history and that history we need to merge into this entity. Please have a look at example below:

Problem Scenario

In the image you can see we have two tables ABC and PQR with ID as key column. We need to create one table ABCPQR which will be join of this both table and they history of this table will contains merged history of both attribute from their respective tables.

How to implement this in database? I am using Teradata as database, Is there any algorithm available to achieve this?


Solution

  • You can make quick work of this with Teradata's PERIOD data type. A period is a range of dates or timestamps. It takes two dates or timestamps as arguments the first for the start date and the second for the end date (up to but not including).

    In your case we'll convert your Start_Date and End_Date to Periods to perform the join. We will use the Period specific function P_INTERSECT to find overlapping periods on which we will join.

    SELECT
        attr1,
        attr2,
        /*pull the BEGIN of the intersected period*/
        BEGIN(t1.validperiod P_INTERSECT t2.validperiod) as startdate,
        /*pull the END of the intersected period (subtracting a day since period end
          dates are "up to but not including")*/
         PRIOR(END(t1.validperiod P_INTERSECT t2.validperiod)) as enddate   
    FROM
        (SELECT abc.*, PERIOD(start_date, NEXT(end_date)) as validperiod FROM abc) t1
        INNER JOIN (SELECT pqr.*, PERIOD(start_Date, NEXT(end_date)) as validperiod FROM pqr) t2 ON
            t1.id = t2.id 
            /*
            * Now P_INTERSECT our two periods and look for Non-Null intersections
            * The intersection is the date range where the two periods overlap
            */
            AND t1.validperiod P_INTERSECT t2.validperiod IS NOT NULL;
    

    There's a few bonuses that we pick up here:

    1. The logic is clean and simple. No testing for max(start_date) and min(end_date) with nested CASE statements and all that horribleness
    2. If you have to join in a third table, just convert it's start and end date to validperiod and join with t1.validperiod P_INTERSECT t2.validperiod P_INTERSECT t3.validperiod IS NOT NULL Simple simple.
    3. There are a handful of other really useful functions that are period based that will make work like this a snap. NORMALIZE will merge multiple records together based on a composite key and overlapping and meeting periods, for instance.

    Lastly, as a rule, when I'm creating tables and that table has start date and end date I ALWAYS create a new field called validperiod and load it like we do in those subqueries. Then you don't have to convert to period for your joins to be friendly. Just grab your already stored validperiod column and start P_INTERSECTing. It takes all the work out of otherwise ugly joins and selects.