mongodb query-optimization geospatial database-optimization

MongoDB - Geospatial intersection performance

We are developing a small tool that will allow a person to draw a polygon on a Google Map, and it will return zip codes that overlap the drawn overlay(s). I figured this would be a great time to dive into a NoSQL database, since I've heard great things about performance when dealing with large sets of data.

So I took all of the ZCTA (Zip Code Tabulation Area) shape data from the US Census 2010 data set and loaded it into a MongoDB as follows. Sample document:

{ "ZipCode" : 12345,
  "Polygon" : { type : "Polygon",
                coordinates: [[[lng1, lat1],[lng2, lat2], ... , [lngN, latN]]]}}

I also have an index set on the Polygon column as follows:

db.ZipCodes.ensureIndex({"Polygon" : "2dsphere"})

Finally, I translate the points from the google polygon into another GeoJSON polygon, and query the data using $geoIntersects as follows:

db.<collection>.find( { "Polygon" :
                     { $geoIntersects :
                       { $geometry :
                         { type : "Polygon" ,
                           coordinates : [[[lng1,lat1], [lng2,lat2], ... , [lngN, latN]]]
                  } } } } )

This works great when the overlay polygon is small, but when the polygon spans, say, a couple of states (US), the query takes forever ( > 20 min!). How can I bring this down to a more reasonable response time? I have indexed the zip code polygons, and since that's all I'm querying against, I would assume this would be as optimized as it can be. Am I indexing incorrectly?

Thanks in advance!

Solution

After tearing my hair out trying to figure out the best way to accomplish better performance in MongoDB, I decided to try our existing standard DB, SQL Server. I guess my low expectations for SQL Server's geospatial functionality were unfounded. The query ran in < 12 seconds without an index, and didn't scale up exponentially like MongoDB for larger drawn polygons. After adding an index, most queries are in the 1 second range.

I guess I'll be sticking with what I know. I really had high hopes for MongoDB, but geospatial performance is severely lacking (or severely under-documented on how to improve it).