Here's my table schema:
CREATE TABLE tickers (
product_id TEXT NOT NULL,
trade_id INT NOT NULL,
sequence BIGINT NOT NULL,
time TIMESTAMPTZ NOT NULL,
price NUMERIC NOT NULL,
side TEXT NOT NULL,
last_size NUMERIC NOT NULL,
best_bid NUMERIC NOT NULL,
best_ask NUMERIC NOT NULL,
PRIMARY KEY (product_id, trade_id)
);
CREATE INDEX idx_tickers_product_id_time ON tickers (product_id, time);
My application subscribes to Coinbase Pro's websocket on the "ticker" channel and inserts a row into the tickers table whenever it receives a message.
The table has over two million rows now.
I learned how to use index skip scan emulation (see: SELECT DISTINCT is slower than expected on my table in PostgreSQL) in PostgreSQL in order to quickly retrieve distinct product_id values from this table, rather than using the slower SELECT DISTINCT
method.
I also want to retrieve min/max values for other columns. Here's what I came up with. It takes ~2.9 milliseconds over 2.25 rows.
Is there a better way to accomplish this?
WITH product_ids AS (
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT product_id
FROM tickers
ORDER BY 1
LIMIT 1
)
UNION ALL
SELECT l.*
FROM cte c
CROSS JOIN LATERAL (
SELECT product_id
FROM tickers t
WHERE t.product_id > c.product_id -- lateral reference
ORDER BY 1
LIMIT 1
) l
)
TABLE cte
)
SELECT
product_id,
(SELECT (MAX(trade_id) - MIN(trade_id) + 1) FROM tickers WHERE product_id = product_ids.product_id) AS ticker_count,
(SELECT MIN(time) FROM tickers WHERE product_id = product_ids.product_id) AS min_time,
(SELECT MAX(time) FROM tickers WHERE product_id = product_ids.product_id) AS max_time
FROM product_ids
ORDER BY ticker_count DESC
Using the existing index on (product_id, time)
we can get two for the price of one, i.e. fetch product_id
and minimum time
in one index scan:
WITH RECURSIVE product_ids AS (
( -- parentheses required
SELECT product_id, time AS min_time
FROM tickers
ORDER BY 1, 2
LIMIT 1
)
UNION ALL
SELECT l.*
FROM product_ids p
CROSS JOIN LATERAL (
SELECT t.product_id, t.time
FROM tickers t
WHERE t.product_id > p.product_id
ORDER BY 1, 2
LIMIT 1
) l
)
SELECT product_id, min_time
, (SELECT MAX(time) FROM tickers WHERE product_id = p.product_id) AS max_time
, (SELECT MAX(trade_id) - MIN(trade_id) + 1 FROM tickers WHERE product_id = p.product_id) AS ticker_count
FROM product_ids p
ORDER BY ticker_count DESC;
Also, no need for a second CTE wrapper.
Currently you have two indexes: The PK index on (product_id, trade_id)
, and another one on (product_id, time)
. You might optimize this by reversing the column order in one of both. Like:
PRIMARY KEY (trade_id, product_id)
Logically equivalent, but typically more efficient as it covers a wider range of possible queries. See (again):
We only need the existing index on (product_id, time)
, so no direct effect on this query.