Search code examples
mysqlgisspatial

Why does ST_Contains in MySQL return inaccurate results?


I'm trying to speed up the selection of locations from a MySQL table 'location' by adding a POINT column (SRID 4326) with a SPATIAL index. But although it's faster, it seems to produce inaccurate results.

The location table has an id (primary key), name, latitude, longitude and newly added latlon column. The latter calculated with ST_GeomFromText(concat("POINT(",latitude," ",longitude,")"), 4326)

CREATE TABLE `location` (
   `id` varchar(16) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL,
   `name` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
   `latitude` decimal(10,5) DEFAULT NULL,
   `longitude` decimal(10,5) DEFAULT NULL,
   `latlon` point SRID 4326 NOT NULL
) ;

I used to get locations in a certain area with this query. For example latitudes between 15 and 69 and longitudes between -78 and 84

SELECT * FROM `location` WHERE (latitude BETWEEN 15 AND 69) AND (longitude BETWEEN -78 AND 84)

In the new situation I would check for a point-in-polygon intersection like this:

SELECT * FROM location 
WHERE (ST_CONTAINS(
ST_GeomFromText('POLYGON((69 -78, 69 84, 15 84, 15 -78, 69 -78))', 4326), latlon))

(polygon from northwest -> northeast -> southeast -> southwest -> northwest)

I would expect to get the same results, but they are different. The point-in-polygon method always returns less results. In this example the first would return 18365 results and the latter only 6961.

To check the differences I ran this query:

SELECT id,name,latitude,longitude,
(ST_CONTAINS(
ST_GeomFromText('POLYGON((69 -78, 69 84, 15 84, 15 -78, 69 -78))', 4326), latlon)) AS contains 
FROM `location` 
WHERE (latitude BETWEEN 15 AND 69) AND (longitude BETWEEN -78 AND 84) 

Some of it's results as an example (5 contained and 5 not contained):

Id Name Latitude Longitude Contains
53VA Horse Feathers Airport 38.634 -77.7575 1
84VA Cia Headquarters Heliport 38.9529 -77.1525 1
GQPZ Tazadit Airport 22.75735 -12.48223 0
LENF Monforte De Lemos 42.54694 -7.51917 0
LIAY La Bagnara 41.68722 13.49917 0
LTCK Muş Airport 38.7478 41.6612 0
LW68 Srpci Airfield 41.15173 21.40687 0
PA71 Dwight's Delight Airport 41.5209 -76.4622 1
RI16 Keskinen Balloonport 41.71991 -71.6316 1
VA79 Hickory Tree Farms Airport 38.9529 -77.7469 1

The error seems only marginal with a smaller rectangle; only a few percent inaccurate instead of almost a third.

I can't figure out what I'm doing wrong here. Also tried to query using ST_INTERSECT, or defining a more detailed polygon with 8 points describing the rectangle instead of 4. None of this helped.

What am I overlooking here?


Solution

  • Since your polygon only has 4 corners the lines joining them will be straight rather than great circles so some of your points will fail to be included in the polygon.

    enter image description here

    This picture shows the issue - you are using the purple box while the world uses the yellow one.

    You probably want to densify the edges of your polygon to make it follow the curvature of the earth - I'm not a MySql user but in PostGIS I'd use STDensify.