I am using python and mongodb. I have a collection which contains 40000 documents. I have a group of coordinates and I need to find which document these coordinates belong to. Now I am doing:
cell_start = citymap.find({"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
This method is a typical geoJSON method and it works well. Now I know some documents have such a field:
{'trips_dest':......}
The value of this field is not important so I just skip that. The thing is that, instead of looking for documents from all these 40000 documents, I can just look for documents from documents which have the field called 'trips_dest'.
Since I know only about 40% of documents have the field 'trips_dest' so I think this would increase the efficiency. However, I don't know how to modify my code to do that. Any idea?
You need the $exists query operator. Something like that:
cell_start = citymap.find({"trips_dest": {$exists: true},
"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
To quote the documentation:
Syntax:
{ field: { $exists: <boolean> } }
When
<boolean>
is true,$exists
matches the documents that contain the field, including documents where the field value is null
If you need to reject null values, use:
"trips_dest": {$exists: true, $ne: null}
As a final note, a sparse index might eventually speed up such query.