Search code examples
sqlpostgresqlquery-optimization

query execution stops when overtake a threshold


I'm doing a optimization in a j2ee page. In a part of the page there some boxes to apply some filters to generate different kind of reports. After that we show a table with all the results but sometimes depending on the filters selected there are a huge amount of data to plot and we are thinking to export it directly to a Excel file.

We have a query that counts the number of outlines that we have. But that query could take some time depending on the filters selected. We have decided to put a threshold to decide when we plot results in a webpage or when we plot in an excel file. But we want to accelerate de 'count query' to know only if the final count is above or under the threshold, we don't need to know the exact number of rows only if its above or under but we need the query as fast as possible.

SELECT 
 COUNT(*) count
FROM 
 TABLE_X
INNER JOIN TABLE_Y ON X.a = Y.a
WHERE
X
AND Y
GROUP BY
 X, Y
**Having
COUNT(*) > THRESHOLD;**

That's the main idea we have, but i'm pretty lost how to continue and if it's possible to add a clause to stop the query when overtake the threshold.


Solution

  • It seems like you are trying to estimate the count rather than actually calculating it, right?

    There is an interesting article explaining how to do just that. It states that it is much faster than executing the queries themselves, so it might be just what you need: https://wiki.postgresql.org/wiki/Count_estimate

    Basically, the idea is that you either query the catalog table pg_class:

     SELECT reltuples FROM pg_class WHERE relname = 'tbl';
    

    Or, if you have a more complex query:

     SELECT count_estimate('SELECT * FROM tbl WHERE t < 100');
    

    Where count_estimate is function that analyzes the execution plan to get the estimation:

    CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS
    $func$
    DECLARE
        rec   record;
        ROWS  INTEGER;
    BEGIN
        FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN ROWS IS NOT NULL;
    END LOOP;
    
    RETURN ROWS;
    END
    $func$ LANGUAGE plpgsql;