I have a table with a few million rows, where rows are quite often inserted and even more often fetched.
Time of row insertion is not critical, but the time of fetching is because it is serving a website. Therefore, I created already an index, that helped to fetch much faster.
The queries are pretty simple and contain no JOIN
s.
The problem happens with SELECT
queries. The same SELECT
query will run every few seconds to check for new or updated rows once a user has performed a search. However, it is not strange that the SELECT
query runs for 50 seconds the first time and afterwards those same queries take less than 1 second.
That makes me think that the problem is not the SELECT
statement itself, but something else.
The table is:
CREATE TABLE all_legs (
carrier TEXT,
dep_hub TEXT,
arr_hub TEXT,
dep_dt TIMESTAMP WITH TIME ZONE,
arr_dt TIMESTAMP WITH TIME ZONE,
price_ct INTEGER,
... 5 more cols ...,
PRIMARY KEY (carrier, dep_hub, arr_hub, dep_dt, arr_dt, ...3 other cols...)
)
INDEX is:
CREATE INDEX IF NOT EXISTS fetch_index ON all_legs(dep_dt, LEFT(dep_hub::text, 6), LEFT(arr_hub::text, 6));
SELECT query:
SELECT * FROM all_legs
WHERE dep_dt >= %s
AND dep_dt < %s
AND (LEFT(dep_hub::text, 6) = %s AND LEFT(arr_hub::text, 6) = %s)
Such a case does not always happens and is therefore difficult to replicate. Here there is an EXPLAIN
statement from my local database, which has less data than the one on Heroku and run actually quite fast:
Index Scan using tz_idx on all_legs (cost=0.41..111184.33 rows=1 width=695) (actual time=128.100..136.690 rows=20 loops=1)
Index Cond: (("left"(dep_hub, 6) = 'ES-PMI'::text) AND ("left"(arr_hub, 6) = 'ES-MAD'::text))
Filter: ((dep_dt)::date = '2018-01-19'::date)
Rows Removed by Filter: 271
Planning time: 3.798 ms
Execution time: 138.525 ms
Why is the first time much slower and how can I reduce the running time of the first query?
dep_hub
) -- [empty] table to contain the "squeezed out" domain
CREATE TABLE dep_hub
( id SERIAL NOT NULL PRIMARY KEY
, dep_hub varchar
, UNIQUE (dep_hub)
);
-- This is done in the chained insert/update
-- INSERT INTO dep_hub(dep_hub)
-- SELECT DISTINCT dep_hub
-- FROM all_legs ;
-- an index may speedup the final update
-- (the index will be dropped automatically
-- once the column is dropped)
CREATE INDEX ON all_legs (dep_hub);
-- The original table needs a "link" to the new table
ALTER TABLE all_legs
ADD column dep_hub_id INTEGER -- NOT NULL
REFERENCES dep_hub(id)
;
-- FK constraints are helped a lot by a supportive index.
CREATE INDEX all_legs_dep_hub_fk ON all_legs (dep_hub_id);
-- Chained query to:
-- * populate the domain table
-- * initialize the FK column in the original table
WITH src AS (
INSERT INTO dep_hub(dep_hub)
SELECT DISTINCT a.dep_hub
FROM all_legs a
RETURNING *
)
UPDATE all_legs dst
SET dep_hub_id = src.id
FROM src
WHERE src.dep_hub = dst.dep_hub
;
-- Now that we have the FK pointing to the new table,
-- we can drop the redundant column.
ALTER TABLE all_legs DROP COLUMN dep_hub;