I have a very large table which gets loaded daily. The business user is mostly only interested in monthly snapshots (latest record by key field each month).
For example:
Loan_Number Balance Load_Date
1 $1 4/25/2016
1 $.75 2/15/2015
1 $1.50 4/13/2016
2 $2 1/1/2016
2 $2.75 4/30/2016
2 $2.50 5/05/2016
2 $2 4/15/2016
3 $3 5/05/2016
3 $3.50 5/15/2016
The April snapshot would be:
SELECT *
FROM Loan_Table
WHERE Load_Date <= DATE '4/30/2016'
QUALIFY
row_number() OVER(
PARTITION BY Loan_Number
ORDER BY Load_Date DESC)
= 1
Are there any index or partition structures that could be added to this table to improve the performance? We're running TD 14 if that helps.
The PI on Load_Number
is probably the best you can get, partitioning won't help. Explain hopefully shows 'STAT FUNCTION ... built locally`.
But depending on row count & record size a good old NUSI plus MAX might help:
CREATE INDEX (Loan_Number, Load_Date) ORDER BY HASH(Loan_Number) ON Loan_Table;
SELECT *
FROM Loan_Table AS t1
WHERE Load_Date =
( SELECT MAX(Load_Date )
FROM Loan_Table AS t2
WHERE t1.Loan_Number = t2.Loan_Number
AND Load_Date <= DATE '2016-04-30'
)