Search code examples
postgresqlpostgresql-12

How to select a date range *and* the entries immediately before and after that range?


I'm working with a table where each row has a timestamp, and that timestamp is unique for a given set of other column values:

CREATE TEMPORARY TABLE time_series (
    id SERIAL PRIMARY KEY,
    created TIMESTAMP WITH TIME ZONE NOT NULL,
    category TEXT,
    value INT
);

CREATE UNIQUE INDEX ON time_series (created, category);

INSERT INTO time_series (created, category, value)
VALUES ('2000-01-01 00:00:00Z', 'foo', 1),
       ('2000-01-01 06:00:00Z', 'bar', 5),
       ('2000-01-01 12:00:00Z', 'bar', 5),
       ('2000-01-02 00:00:00Z', 'bar', 5),
       ('2000-01-02 12:34:45Z', 'bar', 2),
       ('2000-01-03 00:00:00Z', 'bar', 3),
       ('2000-01-04 00:00:00Z', 'bar', 3),
       ('2000-01-04 11:11:11Z', 'foo', 4),
       ('2000-01-04 22:22:22Z', 'bar', 5),
       ('2000-01-04 23:23:23Z', 'bar', 4),
       ('2000-01-05 00:00:00Z', 'foo', 1),
       ('2000-01-05 23:23:23Z', 'bar', 4);

The timestamps are not spaced uniformly. My task, given an arbitrary start and end datetime, is to get the entries between those datetimes and the entries immediately before and after that range. Basically, how do I simplify this query:

(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created < '2000-01-02 06:00:00Z'
  ORDER BY created DESC
  LIMIT 1)
 UNION
(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created >= '2000-01-02 06:00:00Z'
    AND created < '2000-01-04 12:00:00Z')
 UNION
(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created >= '2000-01-04 12:00:00Z'
  ORDER BY created
  LIMIT 1)
 ORDER BY created;

created                 value
2000-01-02 00:00:00+00  5
2000-01-02 12:34:45+00  2
2000-01-03 00:00:00+00  3
2000-01-04 00:00:00+00  3
2000-01-04 22:22:22+00  5

The use case is getting the data points to display a graph: I know the datetimes of the left and right edges of the graph, but they will not in general align exactly with created datetimes, so in order to display a graph all the way to the edge I need a data point to either side of the range.

Fiddle


Non-solutions:

  • I can not simply select the whole range, because it might be huge.
  • I can not select some arbitrarily long period outside of the given range, because that data set might again be huge or whichever period I select might not be enough to get the next readings.

Solution

  • EDITED:

    You can combine UNION ALL with ORDER BY and LIMIT and some clause bounds.

    Something like this:

    APPROACH 1:

    SELECT created, 
           value 
      FROM (SELECT created, value
              FROM time_series
             WHERE category = 'bar'
               AND created < '2000-01-02 06:00:00Z'
          ORDER BY created ASC LIMIT 1
           ) AS ub
    UNION ALL
          SELECT created, value
            FROM time_series
           WHERE category = 'bar'
             AND created >= '2000-01-02 06:00:00Z'
             AND created < '2000-01-04 12:00:00Z'
       UNION ALL
    SELECT created, 
           value
     FROM (SELECT created, value 
             FROM time_series
            WHERE category = 'bar'
              AND created >= '2000-01-04 12:00:00Z'
         ORDER BY created DESC LIMIT 1
          ) AS lb 
    ORDER BY 1;
    

    EXPLAIN ANALYZE from approach 1:

                                                                                              QUERY PLAN                                                                                           
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=3.60..3.61 rows=3 width=12) (actual time=0.228..0.237 rows=5 loops=1)
       Sort Key: time_series.created
       Sort Method: quicksort  Memory: 25kB
       ->  HashAggregate  (cost=3.55..3.58 rows=3 width=12) (actual time=0.182..0.195 rows=5 loops=1)
             Group Key: time_series.created, time_series.value
             ->  Append  (cost=1.16..3.53 rows=3 width=12) (actual time=0.073..0.163 rows=5 loops=1)
                   ->  Limit  (cost=1.16..1.16 rows=1 width=12) (actual time=0.070..0.073 rows=1 loops=1)
                         ->  Sort  (cost=1.16..1.16 rows=1 width=12) (actual time=0.065..0.067 rows=1 loops=1)
                               Sort Key: time_series.created DESC
                               Sort Method: quicksort  Memory: 25kB
                               ->  Seq Scan on time_series  (cost=0.00..1.15 rows=1 width=12) (actual time=0.026..0.035 rows=2 loops=1)
                                     Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                                     Rows Removed by Filter: 8
                   ->  Seq Scan on time_series time_series_1  (cost=0.00..1.18 rows=1 width=12) (actual time=0.007..0.016 rows=3 loops=1)
                         Filter: ((created >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (created < '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                         Rows Removed by Filter: 7
                   ->  Limit  (cost=1.16..1.16 rows=1 width=12) (actual time=0.051..0.054 rows=1 loops=1)
                         ->  Sort  (cost=1.16..1.16 rows=1 width=12) (actual time=0.047..0.049 rows=1 loops=1)
                               Sort Key: time_series_2.created
                               Sort Method: quicksort  Memory: 25kB
                               ->  Seq Scan on time_series time_series_2  (cost=0.00..1.15 rows=1 width=12) (actual time=0.009..0.016 rows=2 loops=1)
                                     Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                                     Rows Removed by Filter: 8
     Planning time: 0.388 ms
     Execution time: 0.438 ms
    (25 rows)
    

    Another similar approach can be used.

    APPROACH 2:

      SELECT created, value
            FROM time_series
           WHERE category = 'bar'
             AND created >= (SELECT created
                               FROM time_series
                              WHERE category = 'bar'
                                AND created < '2000-01-02 06:00:00Z'
                           ORDER BY created ASC LIMIT 1)
             AND created < (SELECT created
                              FROM time_series
                             WHERE category = 'bar'
                               AND created >= '2000-01-04 12:00:00Z'
                          ORDER BY created DESC LIMIT 1
                           )
    

    EXPLAIN ANALYZE from approach 2:

    --------------------------------------------------------------------------------------------------------------------------------------
     Seq Scan on time_series  (cost=2.33..3.50 rows=1 width=12) (actual time=0.143..0.157 rows=6 loops=1)
       Filter: ((created >= $0) AND (created < $1) AND (category = 'bar'::text))
       Rows Removed by Filter: 4
       InitPlan 1 (returns $0)
         ->  Limit  (cost=1.16..1.16 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1)
               ->  Sort  (cost=1.16..1.16 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
                     Sort Key: time_series_1.created
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on time_series time_series_1  (cost=0.00..1.15 rows=1 width=8) (actual time=0.008..0.015 rows=2 loops=1)
                           Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                           Rows Removed by Filter: 8
       InitPlan 2 (returns $1)
         ->  Limit  (cost=1.16..1.16 rows=1 width=8) (actual time=0.041..0.044 rows=1 loops=1)
               ->  Sort  (cost=1.16..1.16 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=1)
                     Sort Key: time_series_2.created DESC
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on time_series time_series_2  (cost=0.00..1.15 rows=1 width=8) (actual time=0.007..0.013 rows=2 loops=1)
                           Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                           Rows Removed by Filter: 8
     Planning time: 0.392 ms
     Execution time: 0.288 ms
    

    As you're using limit, the query will run fast.

    APPROACH 3:

    WITH a as (
          SELECT created,
                 value, 
                 lag(created, 1) OVER (ORDER BY created desc) AS ub,
                 lag(created, -1) OVER (ORDER BY created desc) AS lb
            FROM time_series
           WHERE category = 'bar'
     ) SELECT created, 
              value
         FROM a
        WHERE ub>='2000-01-02 06:00:00Z'
          AND lb<'2000-01-04 12:00:00Z'
     ORDER BY created
    

    EXPLAIN ANALYZE from approach 3:

                                                                   QUERY PLAN                                                                 
    --------------------------------------------------------------------------------------------------------------------------------------------
     Sort  (cost=1.19..1.20 rows=1 width=12) (actual time=0.174..0.181 rows=5 loops=
    1)
       Sort Key: a.created
       Sort Method: quicksort  Memory: 25kB
       CTE a
         ->  WindowAgg  (cost=1.14..1.16 rows=1 width=28) (actual time=0.075..0.107 rows=7 loops=1)
               ->  Sort  (cost=1.14..1.14 rows=1 width=12) (actual time=0.056..0.067 rows=7 loops=1)
                     Sort Key: time_series.created DESC
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on time_series  (cost=0.00..1.12 rows=1 width=12) (actual time=0.018..0.030 rows=7 loops=1)
                           Filter: (category = 'bar'::text)
                           Rows Removed by Filter: 3
       ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=12) (actual time=0.088..0.131 rows=5 loops=1)
             Filter: ((ub >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (lb < '2000-01-04 12:00:00+00'::timestamp with time zone))
             Rows Removed by Filter: 2
     Planning time: 0.175 ms
     Execution time: 0.247 ms
    (16 rows)