Using pymongo, I'm searching for MongoDB documents in which the string value in one field is not equivalent to the string value in another field. I've tried $cmp, $strcasecmp, $ne, and $eq, but they all keep returning that strings that look equivalent to me are not equivalent.
The pipeline starts as a join between collections with documents like the following examples:
# test collection:
[
{'_id': '611868136', 'doc_type': 'way'},
{'_id': '5792648632', 'doc_type': 'node'},
{'_id': '611868133', 'doc_type': 'node'},
{'_id': '1', 'doc_type': 'node'}
]
# refer_docs collection:
[
{'_id': '8483444',
'refs': [{'ref': '611868136', 'ref_type': 'way'},
{'ref': '5792648632', 'ref_type': 'node'},
{'ref': '611868133', 'ref_type': 'way'}],
'doc_type': 'relation'}
]
Now, here's the aggregation:
import pymongo
mongo_client = MongoClient('localhost:27017')
osm_db = mongo_client.osm
refer_docs_col = osm_db["refer_docs"]
pipeline = [
{ "$unwind" : "$refs" },
{
"$lookup" : {
"from" : "test",
"localField" : "refs.ref",
"foreignField" : "_id",
"as" : "ref_doc"
}
},
{ "$match" : { "ref_doc" : { "$ne" : [] } } },
{ "$unwind" : "$ref_doc"},
{ "$match" : { "refs.ref_type" : { "$ne" : "$ref_doc.doc_type" } } },
{ "$project" : { "_id" : 1, "refs" : 1, "ref_doc.doc_type" : 1,
"cmp" : { "$cmp" : [ "refs.ref_type",
"ref_doc.doc_type" ] } } },
{ "$limit" : 1 }
]
[doc for doc in refer_docs_col.aggregate(pipeline)]
I would expect the following document to be returned:
[{'_id': '467676638',
'refs': {'ref': '611868133', 'ref_type': 'way'},
'ref_doc': {'doc_type': 'node'},
'cmp': 1}]
But, it also returns documents that look like this:
[{'_id': '8483444',
'refs': {'ref': '611868136', 'ref_type': 'way'},
'ref_doc': {'doc_type': 'way'},
'cmp': 1},
{'_id': '8483444',
'refs': {'ref': '5792648632', 'ref_type': 'node'},
'ref_doc': {'doc_type': 'node'},
'cmp': 1},
{'_id': '8483444',
'refs': {'ref': '611868133', 'ref_type': 'way'},
'ref_doc': {'doc_type': 'node'},
'cmp': 1}]
Notice the compared fields ("refs.ref_type", "ref_doc.doc_type") clearly contain equivalent values in two cases(e.g. "node"), but the comparison operator returns 1 in all cases, indicating the first value is greater than the second. It should return 0, indicating the two values are equivalent.
Ack! Why??
I forgot a couple of dollar signs in the $cmp statement, so I was comparing the literal strings of the field names rather than the field values. So obvious, ugh.
pipeline = [
{ "$unwind" : "$refs" },
{
"$lookup" : {
"from" : "test",
"localField" : "refs.ref",
"foreignField" : "_id",
"as" : "ref_doc"
}
},
{ "$match" : { "ref_doc" : { "$ne" : [] } } },
{ "$unwind" : "$ref_doc"},
# { "$match" : { "refs.ref_type" : { "$ne" : "$ref_doc.doc_type" } } },
{ "$project" : { "_id" : 1, "refs" : 1, "ref_doc.doc_type" : 1,
"cmp" : { "$cmp" : [ "$refs.ref_type",
"$ref_doc.doc_type" ] } } },
{ "$match" : { "cmp" : { "$ne" : 0 } } }
]
[doc for doc in refer_docs_col.aggregate(pipeline)]