example:
_id = 001
field 'location' = PARIS FRANCE
field 'country' = FRANCE
_id = 002
field 'location' = TORONTO
field 'country' = CANADA
desired result:
ability to recognize that for _id 001, "france" is also in the value for location field;
whereas, _id 002 does not have a value from country that also is in location
Instead of relying on pandas, would like to see if there are more efficient options using pymongo, for example?
This is sensitive to case, and possible abbreviations, etc., but here's one way to identify if one string is contained within the other.
Given an example collection like this:
[
{
"_id": "001",
"location": "PARIS FRANCE",
"country": "FRANCE"
},
{
"_id": "002",
"location": "TORONTO",
"country": "CANADA"
}
]
This will set "isIn"
if "country"
is contained within "location"
or vice-versa.
db.collection.aggregate([
{
"$set": {
"isIn": {
"$gte": [
{
"$sum": [
{ // returns pos or -1 if not found
"$indexOfCP": ["$location", "$country"]
},
{"$indexOfCP": ["$country", "$location"]}
]
},
-1
]
}
}
}
])
Example output:
[
{
"_id": "001",
"country": "FRANCE",
"isIn": true,
"location": "PARIS FRANCE"
},
{
"_id": "002",
"country": "CANADA",
"isIn": false,
"location": "TORONTO"
}
]
Try it on mongoplayground.net.