We are developing a small tool that will allow a person to draw a polygon on a Google Map, and it will return zip codes that overlap the drawn overlay(s). I figured this would be a great time to dive into a NoSQL database, since I've heard great things about performance when dealing with large sets of data.
So I took all of the ZCTA (Zip Code Tabulation Area) shape data from the US Census 2010 data set and loaded it into a MongoDB as follows. Sample document:
{ "ZipCode" : 12345,
"Polygon" : { type : "Polygon",
coordinates: [[[lng1, lat1],[lng2, lat2], ... , [lngN, latN]]]}}
I also have an index set on the Polygon column as follows:
db.ZipCodes.ensureIndex({"Polygon" : "2dsphere"})
Finally, I translate the points from the google polygon into another GeoJSON polygon, and query the data using $geoIntersects as follows:
db.<collection>.find( { "Polygon" :
{ $geoIntersects :
{ $geometry :
{ type : "Polygon" ,
coordinates : [[[lng1,lat1], [lng2,lat2], ... , [lngN, latN]]]
} } } } )
This works great when the overlay polygon is small, but when the polygon spans, say, a couple of states (US), the query takes forever ( > 20 min!). How can I bring this down to a more reasonable response time? I have indexed the zip code polygons, and since that's all I'm querying against, I would assume this would be as optimized as it can be. Am I indexing incorrectly?
Thanks in advance!
After tearing my hair out trying to figure out the best way to accomplish better performance in MongoDB, I decided to try our existing standard DB, SQL Server. I guess my low expectations for SQL Server's geospatial functionality were unfounded. The query ran in < 12 seconds without an index, and didn't scale up exponentially like MongoDB for larger drawn polygons. After adding an index, most queries are in the 1 second range.
I guess I'll be sticking with what I know. I really had high hopes for MongoDB, but geospatial performance is severely lacking (or severely under-documented on how to improve it).