Search code examples
sql-serveroracle-databasejoininference

Join tables with Implied / Inferred data


This would be an easy join except: Table A is explicit for all times and values, but Table B only records rows when the there is a change from the previous value. In looking at Table B one can easily infer the missing times and values, but how to put that into a query?

Data in A.time contains every minute and a corresponding A.Value.

A.Time...........A.Value

9:00...............3.4

9:01...............5.0

9:02...............5.3

9:03...............5.3

9:04...............5.3

and so on…..

Table B only contains rows where the B.value has changed from the previous value.

B.Time..............B.Value

9:00...................4

9:01...................4.1

This is blank, but I know it to be 9:02 / 4.1

This is blank, but I know it to be 9:03 / 4.1

9:04....................4.7

and so on…

I need to do a query that links A.Time and B.Value, but I need the query to understand that a missing time in Table B should be substituted by the B.value of the first B.Time preceeding it.

Final table should be

A.Time...............B.Value

9:00...................4

9:01...................4.1

9:02...................4.1

9:03...................4.1

9:04...................4.7

I am currently writing this for SQL Server, but I need an Oracle solution too Thanks in advance;


Solution

  • In Oracle, you can LEFT JOIN to get all the times and then use LAST_VALUE(b.value) IGNORE NULLS... to fill in the blanks. (NOTE: the ROWS BETWEEN... part is redundant with the ORDER BY in the OVER() clause, but I like it for extra clarity).

    Like this:

    SELECT a.time,
           LAST_VALUE (b.VALUE)
             IGNORE NULLS
             OVER (PARTITION BY NULL 
                   ORDER BY a.time 
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM   table_a a 
    LEFT JOIN table_b b ON b.time = a.time
    ORDER BY a.time;
    

    Here is a full example with test data:

    with table_a ( time, value ) as
      ( SELECT '9:00', 3.4 FROM DUAL UNION ALL
        SELECT '9:01', 5.0 FROM DUAL UNION ALL
        SELECT '9:02', 5.3 FROM DUAL UNION ALL
        SELECT '9:03', 5.3 FROM DUAL UNION ALL
        SELECT '9:04', 5.3 FROM DUAL ),
    table_b ( time, value ) as 
      ( SELECT '9:00', 4 FROM DUAL UNION ALL
        SELECT '9:01', 4.1 FROM DUAL UNION ALL
        SELECT '9:04', 4.7 FROM DUAL )
    SELECT a.time,
           LAST_VALUE (b.VALUE)
             IGNORE NULLS
             OVER (PARTITION BY NULL 
                   ORDER BY a.time 
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM   table_a a 
    LEFT JOIN table_b b ON b.time = a.time
    ORDER BY a.time;
    

    An alternative (which might work on SQL Server) is to use OUTER APPLY. Like so:

    SELECT a.time, b.value
    FROM   table_a a 
    OUTER APPLY ( SELECT * 
                  FROM table_b b 
                  WHERE b.time <= a.time 
                  ORDER BY b.time desc 
                  FETCH FIRST 1 ROW ONLY ) b
    ORDER BY a.time;
    

    Basically, this finds the most recent non-null value from table B for each row in table A.

    SQL*SERVER Solution

    Here is the OUTER APPLY syntax translated to SQL*Server:

    with table_a ( time, value ) as
      ( SELECT '9:00', 3.4 UNION ALL
        SELECT '9:01', 5.0 UNION ALL
        SELECT '9:02', 5.3 UNION ALL
        SELECT '9:03', 5.3 UNION ALL
        SELECT '9:04', 5.3 ),
    table_b ( time, value ) as 
      ( SELECT '9:00', 4 UNION ALL
        SELECT '9:01', 4.1 UNION ALL
        SELECT '9:04', 4.7 )
    SELECT a.time, b.value
    FROM   table_a a OUTER APPLY ( 
            SELECT * FROM table_b b 
            WHERE b.time <= a.time 
            ORDER BY b.time desc 
            OFFSET 0 ROWS
            FETCH NEXT 1 ROWS ONLY ) b
    ORDER BY a.time;