I'm trying to speed up the selection of locations from a MySQL table 'location' by adding a POINT column (SRID 4326) with a SPATIAL index. But although it's faster, it seems to produce inaccurate results.
The location table has an id (primary key), name, latitude, longitude and newly added latlon column. The latter calculated with ST_GeomFromText(concat("POINT(",latitude," ",longitude,")"), 4326)
CREATE TABLE `location` (
`id` varchar(16) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL,
`name` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
`latitude` decimal(10,5) DEFAULT NULL,
`longitude` decimal(10,5) DEFAULT NULL,
`latlon` point SRID 4326 NOT NULL
) ;
I used to get locations in a certain area with this query. For example latitudes between 15 and 69 and longitudes between -78 and 84
SELECT * FROM `location` WHERE (latitude BETWEEN 15 AND 69) AND (longitude BETWEEN -78 AND 84)
In the new situation I would check for a point-in-polygon intersection like this:
SELECT * FROM location
WHERE (ST_CONTAINS(
ST_GeomFromText('POLYGON((69 -78, 69 84, 15 84, 15 -78, 69 -78))', 4326), latlon))
(polygon from northwest -> northeast -> southeast -> southwest -> northwest)
I would expect to get the same results, but they are different. The point-in-polygon method always returns less results. In this example the first would return 18365 results and the latter only 6961.
To check the differences I ran this query:
SELECT id,name,latitude,longitude,
(ST_CONTAINS(
ST_GeomFromText('POLYGON((69 -78, 69 84, 15 84, 15 -78, 69 -78))', 4326), latlon)) AS contains
FROM `location`
WHERE (latitude BETWEEN 15 AND 69) AND (longitude BETWEEN -78 AND 84)
Some of it's results as an example (5 contained and 5 not contained):
Id | Name | Latitude | Longitude | Contains |
---|---|---|---|---|
53VA | Horse Feathers Airport | 38.634 | -77.7575 | 1 |
84VA | Cia Headquarters Heliport | 38.9529 | -77.1525 | 1 |
GQPZ | Tazadit Airport | 22.75735 | -12.48223 | 0 |
LENF | Monforte De Lemos | 42.54694 | -7.51917 | 0 |
LIAY | La Bagnara | 41.68722 | 13.49917 | 0 |
LTCK | Muş Airport | 38.7478 | 41.6612 | 0 |
LW68 | Srpci Airfield | 41.15173 | 21.40687 | 0 |
PA71 | Dwight's Delight Airport | 41.5209 | -76.4622 | 1 |
RI16 | Keskinen Balloonport | 41.71991 | -71.6316 | 1 |
VA79 | Hickory Tree Farms Airport | 38.9529 | -77.7469 | 1 |
The error seems only marginal with a smaller rectangle; only a few percent inaccurate instead of almost a third.
I can't figure out what I'm doing wrong here. Also tried to query using ST_INTERSECT
, or defining a more detailed polygon with 8 points describing the rectangle instead of 4. None of this helped.
What am I overlooking here?
Since your polygon only has 4 corners the lines joining them will be straight rather than great circles so some of your points will fail to be included in the polygon.
This picture shows the issue - you are using the purple box while the world uses the yellow one.
You probably want to densify the edges of your polygon to make it follow the curvature of the earth - I'm not a MySql user but in PostGIS I'd use STDensify
.