sql postgresql performance optimization postgis

Simple SQL query that filters by geographic distance is very slow

Here's my query:

SELECT 1 
FROM post po
WHERE ST_DWithin(po.geog, (SELECT geog FROM person WHERE person.person_id = $1), 20000 * 1609.34, false)
ORDER BY post_id DESC
LIMIT 5;

And here's the EXPLAIN ANALYZE:

I have an index on everything, so I'm not sure why this is slow. The first 5 posts when sorting by post_id DESC satisfy the clause, so shouldn't this return instantly?

I notice that if I replace the ST_DWithin call with an ST_Distance call instead, it runs instantly, like so:

SELECT 1 
FROM post po
WHERE ST_Distance(po.geog, (SELECT geog FROM person WHERE person.person_id = $1)) < 20000 * 1609.34
ORDER BY post_id DESC
LIMIT 5;

That one runs in .15 milliseconds. So, simple solution is to just replace the ST_DWithin call with the ST_Distance call, no?

Well, unfortunately not, because it's not always the first 5 rows that match. Sometimes it has to scan deep within the table, so at that point ST_DWithin is better because it can use the geographic index, while ST_Distance cannot.

I think this may be a problem of postgres' query planner messing up? Like, for some reason it thinks it needs to do a scan of the whole table, despite the ORDER BY x LIMIT 5 clause being front and center? Not sure..

Solution

The distance you are using is almost the length of the equator, so you can expect (almost) all of your results to satisfy this clause.

As ST_DWithin makes use of a spatial index, the planner (wrongly) thinks it will be faster to use it to first filter out the rows. It then has to order (almost) all rows and at last will keep the first 5 ones.

When using st_distance, no spatial index can be used and the planner will pick a different plan, likely one relying on an index on post_id, which is blazing fast. But when the number of rows to be returned (the limit) increases, a different plan is used and the planner probably believe it would be again faster to compute the distance on all rows.