I'm working with a table where each row has a timestamp, and that timestamp is unique for a given set of other column values:
CREATE TEMPORARY TABLE time_series (
id SERIAL PRIMARY KEY,
created TIMESTAMP WITH TIME ZONE NOT NULL,
category TEXT,
value INT
);
CREATE UNIQUE INDEX ON time_series (created, category);
INSERT INTO time_series (created, category, value)
VALUES ('2000-01-01 00:00:00Z', 'foo', 1),
('2000-01-01 06:00:00Z', 'bar', 5),
('2000-01-01 12:00:00Z', 'bar', 5),
('2000-01-02 00:00:00Z', 'bar', 5),
('2000-01-02 12:34:45Z', 'bar', 2),
('2000-01-03 00:00:00Z', 'bar', 3),
('2000-01-04 00:00:00Z', 'bar', 3),
('2000-01-04 11:11:11Z', 'foo', 4),
('2000-01-04 22:22:22Z', 'bar', 5),
('2000-01-04 23:23:23Z', 'bar', 4),
('2000-01-05 00:00:00Z', 'foo', 1),
('2000-01-05 23:23:23Z', 'bar', 4);
The timestamps are not spaced uniformly. My task, given an arbitrary start and end datetime, is to get the entries between those datetimes and the entries immediately before and after that range. Basically, how do I simplify this query:
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created DESC
LIMIT 1)
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z')
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created
LIMIT 1)
ORDER BY created;
created value
2000-01-02 00:00:00+00 5
2000-01-02 12:34:45+00 2
2000-01-03 00:00:00+00 3
2000-01-04 00:00:00+00 3
2000-01-04 22:22:22+00 5
The use case is getting the data points to display a graph: I know the datetimes of the left and right edges of the graph, but they will not in general align exactly with created
datetimes, so in order to display a graph all the way to the edge I need a data point to either side of the range.
EDITED:
You can combine UNION ALL
with ORDER BY
and LIMIT
and some clause bounds.
Something like this:
APPROACH 1:
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1
) AS ub
UNION ALL
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z'
UNION ALL
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
) AS lb
ORDER BY 1;
EXPLAIN ANALYZE
from approach 1:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3.60..3.61 rows=3 width=12) (actual time=0.228..0.237 rows=5 loops=1)
Sort Key: time_series.created
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=3.55..3.58 rows=3 width=12) (actual time=0.182..0.195 rows=5 loops=1)
Group Key: time_series.created, time_series.value
-> Append (cost=1.16..3.53 rows=3 width=12) (actual time=0.073..0.163 rows=5 loops=1)
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.070..0.073 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.065..0.067 rows=1 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.15 rows=1 width=12) (actual time=0.026..0.035 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
-> Seq Scan on time_series time_series_1 (cost=0.00..1.18 rows=1 width=12) (actual time=0.007..0.016 rows=3 loops=1)
Filter: ((created >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (created < '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 7
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.051..0.054 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.047..0.049 rows=1 loops=1)
Sort Key: time_series_2.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=12) (actual time=0.009..0.016 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.388 ms
Execution time: 0.438 ms
(25 rows)
Another similar approach can be used.
APPROACH 2:
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= (SELECT created
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1)
AND created < (SELECT created
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
)
EXPLAIN ANALYZE
from approach 2:
--------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on time_series (cost=2.33..3.50 rows=1 width=12) (actual time=0.143..0.157 rows=6 loops=1)
Filter: ((created >= $0) AND (created < $1) AND (category = 'bar'::text))
Rows Removed by Filter: 4
InitPlan 1 (returns $0)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
Sort Key: time_series_1.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_1 (cost=0.00..1.15 rows=1 width=8) (actual time=0.008..0.015 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
InitPlan 2 (returns $1)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.041..0.044 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=1)
Sort Key: time_series_2.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=8) (actual time=0.007..0.013 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.392 ms
Execution time: 0.288 ms
As you're using limit, the query will run fast.
APPROACH 3:
WITH a as (
SELECT created,
value,
lag(created, 1) OVER (ORDER BY created desc) AS ub,
lag(created, -1) OVER (ORDER BY created desc) AS lb
FROM time_series
WHERE category = 'bar'
) SELECT created,
value
FROM a
WHERE ub>='2000-01-02 06:00:00Z'
AND lb<'2000-01-04 12:00:00Z'
ORDER BY created
EXPLAIN ANALYZE
from approach 3:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1.19..1.20 rows=1 width=12) (actual time=0.174..0.181 rows=5 loops=
1)
Sort Key: a.created
Sort Method: quicksort Memory: 25kB
CTE a
-> WindowAgg (cost=1.14..1.16 rows=1 width=28) (actual time=0.075..0.107 rows=7 loops=1)
-> Sort (cost=1.14..1.14 rows=1 width=12) (actual time=0.056..0.067 rows=7 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.12 rows=1 width=12) (actual time=0.018..0.030 rows=7 loops=1)
Filter: (category = 'bar'::text)
Rows Removed by Filter: 3
-> CTE Scan on a (cost=0.00..0.03 rows=1 width=12) (actual time=0.088..0.131 rows=5 loops=1)
Filter: ((ub >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (lb < '2000-01-04 12:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 2
Planning time: 0.175 ms
Execution time: 0.247 ms
(16 rows)