I would like to understand when the offset
and limit
statements execute in a Postgresql query. Given a query with a format such as
select
a.*,
(-- some subquery here) as sub_query_result
from some_table a
where -- some condition
offset :offset
limit :limit
My understanding is the table will first be filtered using the where
statement, and then the remaining rows will be projected into the form as defined by select
statement.
Do the offset
and limit
statements execute after all operations have occurred in the select
statement? Or does it apply the where
, offset
, and limit
statements first and then the select
part of the query?
I am hoping that it applies the where
, offset
, and limit
statements first that if I had a result set say of 10,000 rows, and I only want the 2nd page of 1000, it would only execute the subquery 1000 times, for example.
A query with LIMIT but without ORDER BY makes a little sense. From the documentation:
When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique order. Otherwise you will get an unpredictable subset of the query's rows.
When the ORDER BY clause is present, expressions in the select list (including subqueries or functions) must be evaluated for as many rows as needed to determine the proper order. In best scenarios the number of computed rows may be limited to the sum LIMIT + OFFSET, if the sum is less than the number of filtered rows. This means that (in some simplification) the greater OFFSET the longer the query is run:
The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient.
In some cases there may be optimizations when the planner will recognize an expression as immutable, but generally you should expect that a subquery will be executed at least LIMIT + OFFSET times. In Postgres 9.5 or earlier the number of computed rows may be even larger if the ordering is not based on an index.