Search code examples
mongodbmongodb-querygeospatialgeojson

Strange problem when storing and querying GeoJson in MongoDB


I want to store GeoJson data for an area using MongoDB. The data comes from an official website. Each area is represented as MultiPolygon. In the end, I want to find all areas that contain a lng/lat pairs using a $intersect like that:

db.areas.find({
  "location.geometry": {
    "$geoIntersects": {
         "$geometry": {
            "type": "Point",
            "coordinates": [ 
                <lng>, 
                <lat>
            ]
         }
     }
  }
}

In principle, it seems to work just fine. However, I've encountered problems with some areas seemingly with respect to the set of polygons of a MultiPolygon. I could boil down my problem to an individual case:

An area (being a GeoJson MultiPolygon) has six polygons, say [A, B, C, D, E, F]. Also the point <lng>,<lat> I query for lies within polygon A. Now the query above only works if the area does not contain the polygons D and F (A has to be included always, of course) -- that is, I get the expected search result. Otherwise, the query is empty (but no error). In short

  • What works: [A], [A,B], [A,B,C], [A,B,C,E], [A,C], ... (any combination with A and without D & F)
  • What doesn't work: [A,D], [A,B,F], ... (any combination that contains D or F)

What is the problem with polygons D and F? Are they not allowed to overlap with other polygons in the MultiPolygon? Are they maybe too small? I've tried the GeoJson definition but couldn't see any issues. Could it be because the GeoJson support of MongoDB.


Solution

  • You are correct that without any special considerations, you can insert a Polygonor MultiPolygon into MongoDB that has deformed GeoJSON structure. This is because unless you specifically create a geo index on the field, MongoDB doesn't know it is GeoJSON at all. The geo engine will silently not match a target intersect geometry, much as it would if you pointed it at a simple scalar field like {"name":"buzz"}. If you add an index thusly:

    db.geo.createIndex({loc:"2dsphere"}) 
    

    Then this will activate the geo-aware machinery and if you try to insert or update a deformed GeoJSON shape, it will produce an error (scroll to see the Loop not closed part):

    {
        "nMatched" : 0,
        "nUpserted" : 0,
        "nModified" : 0,
        "writeError" : {
            "code" : 16755,
            "errmsg" : "Can't extract geo keys: { _id: 0.0, loc: { type: \"MultiPolygon\", coordinates: [ [ [ [ -83.0, 40.0 ], [ -83.0, 41.0 ], [ -82.0, 41.0 ], [ -82.0, 40.0 ], [ -83.0, 40.0 ] ] ], [ [ [ -93.0, 40.0 ], [ -93.0, 41.0 ], [ -92.0, 41.0 ], [ -92.0, 40.0 ], [ -93.0, 40.0 ] ] ], [ [ [ -73.0, 49.0 ], [ -72.0, 41.0 ], [ -72.0, 40.0 ], [ -73.0, 40.0 ], [ -73.0, 41.0 ] ] ] ] } }  Loop is not closed: [ [ -73.0, 49.0 ], [ -72.0, 41.0 ], [ -72.0, 40.0 ], [ -73.0, 40.0 ], [ -73.0, 41.0 ] ]"
        }
    }
    

    In other words, the geo index becomes the guard at the door and ensures that all shapes written are GeoJSON compliant. This is also why it is very useful to ensure the index is created before inserts and updates because trying to create a geo index on many docs with potentially 100s or 1000s of deformed shapes will lead to much tedious work trying to isolate and fix the bad shapes one at a time.