Search code examples
sqldatetimeteradatawindow-functionsgaps-and-islands

How to aggregate data with slowly changing dimension


I will try to explain my problem with a ridiculous english level and a very hard situation to explain.

On Teradata, I have a first table table1 with SCD2 changing (slowly changing dimension), so each record have a start and end date.

Dataset

I want to create new table (table2) with all columns except one (non-useful) from table1. And obviously without this column I have duplicate lines so I group by all other columns, and calculate min(startdate) and max(enddate) to recreate historical records. So i want result like this :

Expected results

But the only result i can return is like this :

select Key, UsefulData, min(EtartDate), max(EndDate)
from table1
group by Key, UsefulData

Actual results

With this kind of result, if I want to analyse situation at January 30, 2 lines came back (UsefulData = 1 and = 0), but the truth is UsefulData = 1.

How can I create my table 2 from dataset table 1 please ?

Precisions : StartDate of NextRow always EndDate+1 of current row.


Solution

  • Teradata supports syntax to do exactly what you want, but it works on Periods only. Luckily your StartDate & EndDate can be ccombined into a Period using period(StartDate, EndDate+1) (start is included, but end excluded in Standard SQL):

    select NORMALIZE Key, UsefulData, period(StartDate, EndDate+1) as pd
    from table1
    

    If you want to split it back into two separate columns:

    select Key, UsefulData, 
       begin(pd) as StartDate, 
       last(pd) as EndDate -- reverts the +1
    from
     (
       select NORMALIZE Key, UsefulData, period(StartDate, EndDate) as pd
       from table1
     ) as dt
    

    There's also a normalized table, but again, only for Periods.