Search code examples
sqlquery-optimizationteradatadatabase-administration

Index or partition on SCD 2 table commonly rolled up from daily to monthly snapshots


I have a very large table which gets loaded daily. The business user is mostly only interested in monthly snapshots (latest record by key field each month).

For example:

Loan_Number   Balance    Load_Date
1             $1         4/25/2016
1             $.75       2/15/2015
1             $1.50      4/13/2016
2             $2         1/1/2016
2             $2.75      4/30/2016
2             $2.50      5/05/2016
2             $2         4/15/2016
3             $3         5/05/2016
3             $3.50      5/15/2016

The April snapshot would be:

SELECT * 
  FROM Loan_Table
 WHERE Load_Date <= DATE '4/30/2016'
QUALIFY 
       row_number() OVER(
           PARTITION BY Loan_Number 
               ORDER BY Load_Date DESC) 
           = 1

Are there any index or partition structures that could be added to this table to improve the performance? We're running TD 14 if that helps.


Solution

  • The PI on Load_Number is probably the best you can get, partitioning won't help. Explain hopefully shows 'STAT FUNCTION ... built locally`.

    But depending on row count & record size a good old NUSI plus MAX might help:

    CREATE INDEX (Loan_Number, Load_Date) ORDER BY HASH(Loan_Number) ON Loan_Table;
    
    SELECT * 
      FROM Loan_Table AS t1
     WHERE Load_Date = 
      ( SELECT MAX(Load_Date ) 
        FROM Loan_Table AS t2
        WHERE t1.Loan_Number = t2.Loan_Number 
          AND Load_Date  <= DATE '2016-04-30'
      )