Search code examples
sqldatejoinvertica

How to generate daily score in join based on most recent date on reference table


SQL Vertica Question (but any SQL query answer is appreciated) I have a table "Base" currently in "day-url" level, and I try to get 'score' for each url in each day by joining a "day-url" level reference table "Score Update". The problem is the reference table is not in per day but specific dates level. I need to join the score from "Score Update" with "Update Date" right before the "Date" in "Base". Example below: (Please see pictures at bottom)

What I have: (Base)

Date          URL   Score                      
2019-08-01    A                        
2019-08-01    B                                      
2019-08-02    A                                 
2019-08-03    A                                  
2019-08-09    B         
2019-08-10    A         
2019-08-11    A          
2019-08-11    B      
2019-08-13    B

(Score Update)

Update_Date      URL    Score
 2019-07-25      A       x
 2019-07-25      B       y
 2019-08-10      A      alpha
 2019-08-10      B      beta

What I want is: What I have: (Base)

Date            URL   Score             
2019-08-01      A      x                    
2019-08-01      B      y                      
2019-08-02      A      x                           
2019-08-03      A      x                             
2019-08-09      B      y   
2019-08-10      A      alpha   
2019-08-11      A      alpha    
2019-0B-11      B      beta 
2019-08-13      B      beta

enter image description here

And both tables will keep updating for future date data. Can anyone help me how to join these 2 tables?

Thanks!


Solution

  • Do you mean this?

    WITH
    -- your input ...
    base(dt,url,score) AS (
              SELECT DATE '2019-08-01','A',NULL
    UNION ALL SELECT DATE '2019-08-01','B',NULL
    UNION ALL SELECT DATE '2019-08-02','A',NULL
    UNION ALL SELECT DATE '2019-08-03','A',NULL
    UNION ALL SELECT DATE '2019-08-04','B',NULL
    UNION ALL SELECT DATE '2019-08-05','A',NULL
    UNION ALL SELECT DATE '2019-08-06','A',NULL
    UNION ALL SELECT DATE '2019-08-10','A',NULL
    UNION ALL SELECT DATE '2019-08-11','A',NULL
    UNION ALL SELECT DATE '2019-08-11','B',NULL
    UNION ALL SELECT DATE '2019-08-12','A',NULL
    UNION ALL SELECT DATE '2019-08-13','A',NULL
    )
    ,
    -- your input ...
    score_upd(upd_dt,url,score) AS (
              SELECT DATE '2019-07-25','A','x'
    UNION ALL SELECT DATE '2019-07-25','B','alpha'
    UNION ALL SELECT DATE '2019-08-10','A','y'
    UNION ALL SELECT DATE '2019-08-10','B','beta'
    )
    -- real select starts here ..                                     
    SELECT
      b.dt
    , b.url
    , u.score
    FROM base b
    LEFT JOIN score_upd u
      ON b.url=u.url
     AND b.dt INTERPOLATE PREVIOUS VALUE u.upd_dt
    ORDER BY dt,url
    ;
    -- out      dt     | url | score 
    -- out ------------+-----+-------
    -- out  2019-08-01 | A   | x
    -- out  2019-08-01 | B   | alpha
    -- out  2019-08-02 | A   | x
    -- out  2019-08-03 | A   | x
    -- out  2019-08-04 | B   | alpha
    -- out  2019-08-05 | A   | x
    -- out  2019-08-06 | A   | x
    -- out  2019-08-10 | A   | y
    -- out  2019-08-11 | A   | y
    -- out  2019-08-11 | B   | beta
    -- out  2019-08-12 | A   | y
    -- out  2019-08-13 | A   | y
    -- out (12 rows)