Search code examples
pythonpandasmongodbmongodb-querypymongo

Good approach with MongoDB, comparing two fields to determine if value exists in both?


example:

  • _id = 001

    • field 'location' = PARIS FRANCE

    • field 'country' = FRANCE

  • _id = 002

    • field 'location' = TORONTO

    • field 'country' = CANADA

desired result:

ability to recognize that for _id 001, "france" is also in the value for location field;

whereas, _id 002 does not have a value from country that also is in location

Instead of relying on pandas, would like to see if there are more efficient options using pymongo, for example?


Solution

  • This is sensitive to case, and possible abbreviations, etc., but here's one way to identify if one string is contained within the other.

    Given an example collection like this:

    [
      {
        "_id": "001",
        "location": "PARIS FRANCE",
        "country": "FRANCE"
      },
      {
        "_id": "002",
        "location": "TORONTO",
        "country": "CANADA"
      }
    ]
    

    This will set "isIn" if "country" is contained within "location" or vice-versa.

    db.collection.aggregate([
      {
        "$set": {
          "isIn": {
            "$gte": [
              {
                "$sum": [
                  { // returns pos or -1 if not found
                    "$indexOfCP": ["$location", "$country"]
                  },
                  {"$indexOfCP": ["$country", "$location"]}
                ]
              },
              -1
            ]
          }
        }
      }
    ])
    

    Example output:

    [
      {
        "_id": "001",
        "country": "FRANCE",
        "isIn": true,
        "location": "PARIS FRANCE"
      },
      {
        "_id": "002",
        "country": "CANADA",
        "isIn": false,
        "location": "TORONTO"
      }
    ]
    

    Try it on mongoplayground.net.