Search code examples
sqldatabasepostgresqlpostgresql-performance

How to optimize query postgres


I am running the following query:

SELECT fat.*   
FROM Table1 fat  
LEFT JOIN modo_captura mc       ON mc.id = fat.modo_captura_id  
INNER JOIN loja lj              ON lj.id = fat.loja_id  
INNER JOIN rede rd              ON rd.id = fat.rede_id  
INNER JOIN bandeira bd          ON bd.id = fat.bandeira_id  
INNER JOIN produto pd           ON pd.id = fat.produto_id  
INNER JOIN loja_extensao le     ON le.id = fat.loja_extensao_id  
INNER JOIN conta ct             ON ct.id = fat.conta_id
INNER JOIN banco bc             ON bc.id = ct.banco_id  
LEFT JOIN conciliacao_vendas cv ON fat.empresa_id = cv.empresa_id AND cv.chavefato = fat.chavefato AND fat.rede_id = cv.rede_id  
WHERE 1 = 1  
AND cv.controle_upload_arquivo_id = 6906  
AND fat.parcela = 1  
ORDER BY fat.data_venda, fat.data_credito limit 20

But very slowly. Here the Explain plan: http://explain.depesz.com/s/DnXH


Solution

  • Try this rewritten version:

    SELECT fat.*   
    FROM   Table1 fat
    JOIN   conciliacao_vendas cv USING (empresa_id, chavefato, rede_id)
    JOIN   loja lj               ON lj.id = fat.loja_id  
    JOIN   rede rd               ON rd.id = fat.rede_id  
    JOIN   bandeira bd           ON bd.id = fat.bandeira_id  
    JOIN   produto pd            ON pd.id = fat.produto_id  
    JOIN   loja_extensao le      ON le.id = fat.loja_extensao_id  
    JOIN   conta ct              ON ct.id = fat.conta_id
    JOIN   banco bc              ON bc.id = ct.banco_id
    LEFT   JOIN modo_captura mc  ON mc.id = fat.modo_captura_id  
    WHERE  cv.controle_upload_arquivo_id = 6906  
    AND    fat.parcela = 1  
    ORDER  BY fat.data_venda, fat.data_credito
    LIMIT  20;
    

    JOIN syntax and sequence of joins

    In particular I fixed the misleading LEFT JOIN to conciliacao_vendas, which is forced to act as a plain [INNER] JOIN by the later WHERE condition anyways. This should simplify query planning and allow to eliminate rows earlier in the process, which should make everything a lot cheaper. Related answer with detailed explanation:

    USING is just a syntactical shorthand.

    Since there are many tables involved in the query and the order the rewritten query joins tables is optimal now, you can fine-tune this with SET LOCAL join_collapse_limit = 1 to save planning overhead and avoid inferior query plans. Run in a single transaction:

    BEGIN;
    SET LOCAL join_collapse_limit = 1;
    SELECT ...;  -- read data here
    COMMIT;      -- or ROOLBACK;
    

    More about that:

    Index

    Add some indexes on lookup tables with lots or rows (not necessary for just a couple of dozens), in particular (taken from your query plan):

    Seq Scan on public.conta ct ... rows=6771
    Seq Scan on public.loja lj ... rows=1568
    Seq Scan on public.loja_extensao le ... rows=16394

    That's particularly odd, because those columns look like primary key columns and should already have an index ...

    So:

    CREATE INDEX conta_pkey_idx ON public.conta (id);
    CREATE INDEX loja_pkey_idx ON public.loja (id);
    CREATE INDEX loja_extensao_pkey_idx ON public.loja_extensao (id);
    

    To make this really fat, a multicolumn index would be of great service:

    CREATE INDEX foo ON Table1 (parcela, data_venda, data_credito);