Search code examples
sqlsqlitewindow-functions

How can I fill in/interpolate mising timestamps in an SQLite query?


I have a bunch of GPS tracks recorded in a table. Due to a bug in the GPS logging code, the fractional part of each timestamp is incorrect (and repeated). I would like to create a query that finds the first and last record with the same timestamp, and determines the number of records between each rollover in whole seconds. This can be used to determine the delta between each record. I'd like to use this to interpolate the missing/incorrect fractional second part of the timestamp.

Existing Table

| id | timestamp                  |
|----|----------------------------|
| 1  | 2020-09-06 15:08:21.128344 |
| 2  | 2020-09-06 15:08:21.128344 |
| 3  | 2020-09-06 15:08:21.128344 |
| 4  | 2020-09-06 15:08:22.128344 |

Desired Output

| id | timestamp                  |
|----|----------------------------|
| 1  | 2020-09-06 15:08:21.0      |
| 2  | 2020-09-06 15:08:21.25     |
| 3  | 2020-09-06 15:08:21.75     |
| 4  | 2020-09-06 15:08:22.0      |

After reading Select first row in each GROUP BY group, I figured out how to get the first row in each group with identical timestamps.

WITH A AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY time ORDER BY id) as rn, *
    FROM gps_points
) SELECT * FROM A WHERE rn=1 ORDER BY id;

I am hoping to then generate a timestamp by adding (rn -1) * datetime((round(julianday(first_row.timestamp) * 86400) / 86400)) where first_row is the first row of identical timestamps, and delta is 1/(3 repeated timestamps + 1).

This can be explained by this table.

| id | rn       | timestamp                              |
|----|----------|----------------------------------------|
| 1  | 0        | 2020-09-06 15:08:21.0 + 0 * delta      |
| 2  | 1        | 2020-09-06 15:08:21.0 + 1 * delta      |
| 3  | 2        | 2020-09-06 15:08:21.0 + 2 * delta      |
| 4  | 0        | 2020-09-06 15:08:22.0 + 0 * delta      |

I am having trouble integrating the above query into a larger query that does what I want. Perhaps I should just write some python code to do this, but I'd like to improve my SQL knowledge and do this in one fell swoop.


Solution

  • With ROW_NUMBER() and COUNT() window functions:

    WITH cte AS (
      SELECT *,
        ROW_NUMBER() OVER (PARTITION BY timestamp ORDER BY ID) rn,
        COUNT(*) OVER (PARTITION BY timestamp) counter
      FROM tablename  
    )
    SELECT id,
           DATETIME(timestamp) || '.' || 
           SUBSTR('00' || ((rn - 1) * (100 / (counter + 1))), -2) timestamp
    FROM cte
    

    See the demo.
    Results:

    > id | timestamp             
    > -: | :---------------------
    >  1 | 2020-09-06 15:08:21.00
    >  2 | 2020-09-06 15:08:21.25
    >  3 | 2020-09-06 15:08:21.50
    >  4 | 2020-09-06 15:08:22.00
    

    I used counter + 1 for the division, to get your expected results, but I believe that the proper way to do it is without that +1:

    WITH cte AS (
      SELECT *,
        ROW_NUMBER() OVER (PARTITION BY timestamp ORDER BY ID) rn,
        COUNT(*) OVER (PARTITION BY timestamp) counter
      FROM tablename  
    )
    SELECT id,
           DATETIME(timestamp) || '.' || 
           SUBSTR('00' || ((rn - 1) * (100 / counter)), -2) timestamp
    FROM cte 
    

    See the demo.
    Results:

    > id | timestamp             
    > -: | :---------------------
    >  1 | 2020-09-06 15:08:21.00
    >  2 | 2020-09-06 15:08:21.33
    >  3 | 2020-09-06 15:08:21.66
    >  4 | 2020-09-06 15:08:22.00