Search code examples
sqlsql-serverwindow-functions

SQL Server 2008 - Retrieve the nearest non null data


I have two tables: table 1 a package_id and a timestamp column for which I have no weight information available, table 2 a package_id, a timestamp and a weight column where I do have the information.

What I'm trying to do is fill in the table 1 weight information based on table 2 using the following restrictions:

  1. use the closest package_id available ie. for package_id 1 use 2 if available, if not 3 etc
  2. if there is only one weight available use it for all the missing package_id's
  3. if two weights are available, use the higher one ie. for package_id 5, if 4 and 6 are available use 6

The code:

IF OBJECT_ID('tempdb..#TIMEGAPS') IS NOT NULL DROP TABLE #TIMEGAPS
CREATE TABLE #TIMEGAPS (PACK_ID INT, Local_Time DATETIME)

IF OBJECT_ID('tempdb..#REALVALUES') IS NOT NULL DROP TABLE #REALVALUES
CREATE TABLE #REALVALUES (PACK_ID INT, Local_Time DATETIME, WEIGHT INT)


INSERT INTO #TIMEGAPS VALUES
(1,'2018-01-20 18:40:00.000'),
(1,'2018-01-20 18:50:00.000'),
(1,'2018-01-20 19:00:00.000'),
-----------------------------
(7,'2018-01-20 18:40:00.000'),
(7,'2018-01-20 18:50:00.000'),
(7,'2018-01-20 19:00:00.000'),
------------------------------
(12,'2018-01-20 18:40:00.000'),
(12,'2018-01-20 18:50:00.000'),
(12,'2018-01-20 19:00:00.000'),
(12,'2018-01-20 20:00:00.000')


INSERT INTO #REALVALUES VALUES
(2,'2018-01-20 18:40:00.000',50),
(3,'2018-01-20 18:40:00.000',70),
(4,'2018-01-20 18:40:00.000',150),
(5,'2018-01-20 18:40:00.000',60),
(6,'2018-01-20 18:40:00.000',45),
(8,'2018-01-20 18:40:00.000',55),
(9,'2018-01-20 18:40:00.000',25),
---------------------------------
(2,'2018-01-20 18:50:00.000',75),
(3,'2018-01-20 18:50:00.000',80),
(4,'2018-01-20 18:50:00.000',120),
(5,'2018-01-20 18:50:00.000',110),
(11,'2018-01-20 18:50:00.000',30),
---------------------------------
(8,'2018-01-20 19:00:00.000',70)

Final result:

EDIT: I've adapted the solution from here which I believe is what I need.

SELECT tg.PACK_ID, tg.Local_Time, p.WEIGHT
  FROM #TIMEGAPS tg
OUTER APPLY 
(
  SELECT TOP 1 *, ABS(tg.PACK_ID - rv.PACK_ID) AS diff
     FROM #REALVALUES rv
     WHERE (tg.Local_Time = rv.Local_time OR rv.Local_time is null)
     ORDER BY CASE WHEN rv.Local_time IS NULL THEN 2 ELSE 1 END,
     ABS(rv.PACK_ID- tg.PACK_ID) ASC
 ) p

EDIT 2: 3. if two weights are available, use the highest PACK_ID ie. for package_id 5, if PACK_ID 4 and PACK_ID 6 are available use 6


Solution

  • Something like this?

    It uses a row_number by distance.

    SELECT 
     PACK_ID, Local_Time, WEIGHT
    FROM (
      SELECT g.PACK_ID, g.Local_Time, v.WEIGHT, 
      ROW_NUMBER() OVER (PARTITION BY g.PACK_ID, g.Local_Time 
                         ORDER BY ABS(v.PACK_ID - g.PACK_ID), v.PACK_ID DESC) AS RN
      FROM #TIMEGAPS AS g
      JOIN #REALVALUES AS v ON v.Local_Time = g.Local_Time
    ) AS q
    WHERE RN = 1
    ORDER BY PACK_ID, Local_Time