Search code examples
sqlsqlitesql-updatesql-delete

SQLite: merge rows in same table using older record to fill blanks in newer record


I have a SQLite data table that stores records with composite key of <id,rdate>. New records are added for each id on different dates, but new records often don't complete all columns. Incomplete columns may be null or whitespace e.g.

   | id  | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | <null>  |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 |         |         |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

I want to periodically update records, with the same id, to fill blank columns in the most recent record (by rdate) with data from the previous record. In the example above, data from row 1 is used to fill blank columns in row 2

So the table, after running the query, would look like this:

   | id | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | bar1    |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 | foo2    | bar2    |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

I have tried to construct a query to do this but I am struggling with how to approach this, or even if it can be done.

This question looks at merging records but from a de-dup perspective. I haven't been able to find anything doing what I need. COALESCE looks promising, but I haven't been able to figure out how to build the query to use it.

Help and suggestions very much appreciated.


Solution

  • For each of the columns data1 and data2 use a correlated subquery that returns the last previous non null value in that column:

    UPDATE tablename AS t1
    SET data1 = COALESCE(
                  NULLIF(TRIM(t1.data1), ''),
                  (SELECT t2.data1 FROM tablename t2 
                   WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data1), '') IS NOT NULL 
                   ORDER BY t2.rdate DESC LIMIT 1)
                ),
        data2 = COALESCE(
                  NULLIF(TRIM(t1.data2), ''),
                  (SELECT t2.data2 FROM tablename t2 
                   WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data2), '') IS NOT NULL 
                   ORDER BY t2.rdate DESC LIMIT 1)
                )            
    WHERE NULLIF(TRIM(t1.data1), '') IS NULL OR NULLIF(TRIM(t1.data2), '') IS NULL
    

    See the demo.

    But it would be better to update the table so every empty value is replaced with null:

    UPDATE tablename
    SET data1 = NULLIF(TRIM(data1), ''),
        data2 = NULLIF(TRIM(data2), '')
    WHERE TRIM(data1) = '' OR TRIM(data2) = ''
    

    and then the code can be simplified:

    UPDATE tablename AS t1
    SET data1 = COALESCE(
                  t1.data1,
                  (SELECT t2.data1 FROM tablename t2 
                   WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data1 IS NOT NULL 
                   ORDER BY t2.rdate DESC LIMIT 1)
                ),
        data2 = COALESCE(
                  t1.data2,
                  (SELECT t2.data2 FROM tablename t2 
                   WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data2 IS NOT NULL 
                   ORDER BY t2.rdate DESC LIMIT 1)
                )            
    WHERE data1 IS NULL OR data2 IS NULL
    

    See the demo.

    Results:

    id rdate data1 data2
    1 2009-01-01 foo boo
    1 2010-01-04 foo1 bar1
    1 2010-01-08 fooX bar1
    2 2010-01-01 foo2 bar2
    2 2010-01-04 foo2 bar2
    3 2010-01-01 foo3 bar3

    Note that your dates as they are in the sample data are not comparable.
    Change them to the format 'YYYY-MM-DD'.