Search code examples
sqlpostgresqlpostgresql-performance

Reduce fetching time in SQL (indexing already done)


I have a table with a few million rows, where rows are quite often inserted and even more often fetched.

Time of row insertion is not critical, but the time of fetching is because it is serving a website. Therefore, I created already an index, that helped to fetch much faster.

The queries are pretty simple and contain no JOINs.

The problem happens with SELECT queries. The same SELECT query will run every few seconds to check for new or updated rows once a user has performed a search. However, it is not strange that the SELECT query runs for 50 seconds the first time and afterwards those same queries take less than 1 second.

That makes me think that the problem is not the SELECT statement itself, but something else.

The table is:

CREATE TABLE all_legs (
                carrier TEXT,
                dep_hub TEXT,
                arr_hub TEXT,
                dep_dt TIMESTAMP WITH TIME ZONE,
                arr_dt TIMESTAMP WITH TIME ZONE,
                price_ct INTEGER,
                ... 5 more cols ...,
                PRIMARY KEY (carrier, dep_hub, arr_hub, dep_dt, arr_dt, ...3 other cols...)
                )

INDEX is:

CREATE INDEX IF NOT EXISTS fetch_index ON all_legs(dep_dt, LEFT(dep_hub::text, 6), LEFT(arr_hub::text, 6));

SELECT query:

SELECT * FROM all_legs
                    WHERE dep_dt >= %s
                    AND dep_dt < %s
                    AND (LEFT(dep_hub::text, 6) = %s AND LEFT(arr_hub::text, 6) = %s)

Such a case does not always happens and is therefore difficult to replicate. Here there is an EXPLAIN statement from my local database, which has less data than the one on Heroku and run actually quite fast:

Index Scan using tz_idx on all_legs  (cost=0.41..111184.33 rows=1 width=695) (actual time=128.100..136.690 rows=20 loops=1)
  Index Cond: (("left"(dep_hub, 6) = 'ES-PMI'::text) AND ("left"(arr_hub, 6) = 'ES-MAD'::text))
  Filter: ((dep_dt)::date = '2018-01-19'::date)
  Rows Removed by Filter: 271
Planning time: 3.798 ms
Execution time: 138.525 ms

Why is the first time much slower and how can I reduce the running time of the first query?


Solution

    • Example code for squeezing out a single column (dep_hub)
    • If your {dep_hub,arr_hub} both refer to the same domain, you'll have to change things a bit
    • you also have to redefine the Primary Key,
    • and [maybe] add some functional indexes on the squeeze-out table

        -- [empty] table to contain the "squeezed out" domain
    CREATE TABLE dep_hub
        ( id SERIAL NOT NULL PRIMARY KEY
        ,  dep_hub varchar
        , UNIQUE (dep_hub)
        );
    
       -- This is done in the chained insert/update
    -- INSERT INTO dep_hub(dep_hub)
    -- SELECT DISTINCT dep_hub
    -- FROM all_legs ;
    
    
        -- an index may speedup the final update
        -- (the index will be dropped automatically
        -- once the column is dropped)
    CREATE INDEX ON all_legs (dep_hub);
    
        -- The original table needs a "link" to the new table
    ALTER TABLE all_legs
        ADD column dep_hub_id INTEGER -- NOT NULL
        REFERENCES dep_hub(id)
        ;
    
        -- FK constraints are helped a lot by a supportive index.
    CREATE INDEX all_legs_dep_hub_fk ON all_legs (dep_hub_id);
    
        -- Chained query to:
        -- * populate the domain table
        -- * initialize the FK column in the original table
    WITH src AS (
        INSERT INTO dep_hub(dep_hub)
        SELECT DISTINCT a.dep_hub
        FROM all_legs a
        RETURNING *
        )
    UPDATE all_legs dst
    SET  dep_hub_id = src.id
    FROM src
    WHERE src.dep_hub = dst.dep_hub
        ;
    
        -- Now that we have the FK pointing to the new table,
        -- we can drop the redundant column.
    ALTER TABLE all_legs DROP COLUMN dep_hub;