Search code examples
pythonpostgresqlindexingairflow

Insert large amount of data in table with index in Postgres


I have a table tbl in Postgres with 50 million rows. The tbl has an index on column_1 and there are a lot of queries to this table like

select * from tbl
where column_1 = 'value'

Each query returns 0-30 rows, 10 on avarage.

Once a day I completely update data in table. The query is like

delete from tbl;
insert into tbl
select * from tbl_2;
commit;

The challenge I face is the query runs too long: about 2-3 hours. It's probably because of index. Is there a way to speed up the data update and allow user to query tbl while it's being updated. If this is important - the update process is run in python Airflow and the queries come from python web app.


Solution

  • Build a copy of the table:

    CREATE TABLE tbl_new (LIKE tbl);
    
    INSERT INTO tbl_new
    SELECT * FROM tbl_2;
    

    Then, create all required indexes on that table. Once you are done, switch tables:

    BEGIN;
    
    DROP TABLE tbl;
    
    ALTER TABLE tbl_new RENAME TO tbl;
    
    COMMIT;
    

    That will be fast.