Search code examples
sqlpostgresqldatetimetime-series

Add missing dates in a table


I have a table that contains data for every day in 2002, but it has some missing dates. Namely, 354 records for 2002 (instead of 365). For my calculations, I need to have the missing data in the table with Null values

+-----+------------+------------+
| ID  |  rainfall  | date       |
+-----+------------+------------+
| 100 |  110.2     | 2002-05-06 |
| 101 |  56.6      | 2002-05-07 |
| 102 |  65.6      | 2002-05-09 |
| 103 |  75.9      | 2002-05-10 |
+-----+------------+------------+

You see that 2002-05-08 is missing. I want my final table to be like:

+-----+------------+------------+
| ID  |  rainfall  | date       |
+-----+------------+------------+
| 100 |  110.2     | 2002-05-06 |
| 101 |  56.6      | 2002-05-07 |
| 102 |            | 2002-05-08 |
| 103 |  65.6      | 2002-05-09 |
| 104 |  75.9      | 2002-05-10 |
+-----+------------+------------+

Is there a way to do that in PostgreSQL?

It doesn't matter if I have the result just as a query result (not necessarily an updated table).


Solution

  • date is a reserved word in standard SQL and the name of a data type in PostgreSQL. PostgreSQL allows it as identifier, but that doesn't make it a good idea. I use thedate as column name instead.

    Don't rely on the absence of gaps in a surrogate ID. That's almost always a bad idea. Treat such an ID as unique number without meaning, even if it seems to carry certain other attributes most of the time.

    In this particular case, as @Clodoaldo commented, thedate seems to be a perfect primary key and the column id is just cruft - which I removed:

    CREATE TEMP TABLE tbl (thedate date PRIMARY KEY, rainfall numeric);
    INSERT INTO tbl(thedate, rainfall) VALUES
      ('2002-05-06', 110.2)
    , ('2002-05-07', 56.6)
    , ('2002-05-09', 65.6)
    , ('2002-05-10', 75.9);
    

    Query

    Full table by query:

    SELECT x.thedate, t.rainfall  -- rainfall automatically NULL for missing rows
    FROM (
       SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
       FROM   tbl
       ) x
    LEFT   JOIN tbl t USING (thedate)
    ORDER  BY x.thedate
    

    Similar to what @a_horse_with_no_name posted, but simplified and ignoring the pruned id.

    Fills in gaps between first and last date found in the table. If there can be leading / lagging gaps, extend accordingly. You can use date_trunc() like @Clodoaldo demonstrated - but his query suffers from syntax errors and can be simpler.

    INSERT missing rows

    The fastest and most readable way to do it is a NOT EXISTS anti-semi-join.

    INSERT INTO tbl (thedate, rainfall)
    SELECT x.thedate, NULL
    FROM (
       SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
       FROM   tbl
       ) x
    WHERE NOT EXISTS (SELECT 1 FROM tbl t WHERE t.thedate = x.thedate)