Search code examples
mongodbpymongoazure-cosmosdb-mongoapi

How to find and replace string inside arrays in MongoDB and pymongo


I have a collection in MongoDB.(Technically it is the cosmosDB API for MongoDB but in theory that shouldn’t matter.) The collection contains nested arrays of strings. These strings contain brackets inside them, and I need to remove them from the strings inside the arrays. Sample (fake) data:

{
    “Thing_id”: “100”,
    “Thing_name”: “Thing100”
    “Comments”: [
        “Good: [Thing100 is awesome]”,
        “Bad: [I will never buy Thing100 again.]”
    ]
},
{
    “Thing_id”: “101”,
    “Thing_name”: “Thing101”
    “Comments”: [
        “Comparative: [Thing101 is so much better than Thing100.]”,
        “Bad: [Who designed such piece of …]”
    ]
}
]

Any above syntax errors are because I am typing it on my phone…my apologies if it is not well-formed json/bson. To recap, I want to remove just the brackets in each value in the comments arrays. I am using pymongo to connect to the DB, and also python 3.7.4.

I have searched for how to do this, but all I can find are how to update the whole value in the array instead of replacing a character in the string inside the array.

FindAndModify() looked promising until I found out it will only do the first document found.

Is the only option to query the collection and loop through each document?


Solution

  • I don't have access to Azure Cosmos DB, so I can't test. If you are using pymongo and want to update all documents in the collection, you probably want to use update_many.

    Here are a couple parameters for update_many that may work for you.

    filter = {'$expr': {'$gt': [{'$size': '$Comments'}, 0]}}
    update = [
      {
        "$set": {
          # rewrite Comments
          "Comments": {
            "$map": {
              "input": "$Comments",
              "as": "comment",
              "in": {
                "$reduce": {
                  "input": {
                    # get all matches without "[" or "]"
                    # every comment needs to be a string
                    "$regexFindAll": {
                      "input": "$$comment",
                      # backslashes "\" are used to "escape" brackets
                      # the number of "\" required may depend on platform, etc.
                      "regex": "[^\\[\\]]*"
                    }
                  },
                  "initialValue": "",
                  "in": {
                    # concat all matches
                    "$concat": ["$$value", "$$this.match"]
                  }
                }
              }
            }
          }
        }
      }
    ]
    

    You can try this (slightly modified for the language/platform) on mongoplayground.net.