Search code examples
arraysmongodbupdatesbulkupdate

MongoDB: Update a subset of an array based on position range


I need to update values in an array in many MongoDB documents using a bulk write operation (the new array differs for each document). My question is: is there a way to update a subset of array values based on the element position range, passing an array to replace those values with?

Here is my document structure:

{
year: 2020,
location_id: 1,
values: [1.2 0 0 5.2 1.02 8 0 0 0 0 1.2 4]
}

Let's say I want to replace all values except the first one with this new array:

[1 2 5.1 2 4 0 87 1 0.2 2 9]

i.e., 'values' should become:

[1.2 1 2 5.1 2 4 0 87 1 0.2 2 9]

I know how to replace the array fully:

UpdateOne({'year': 2020, 'location_id': 1}, {'$set': {'values': [1.2 1 2 5.1 2 4 0 87 1 0.2 2 9]}})

But this first option requires me to query the collection first to get the first value of the array. I'd like to avoid that.

I also know how to replace values one by one with 11 separate updateone commands (here in python):

col.bulk_write([UpdateOne({'year': 2020, 'location_id': 1}, {'$set': {'values.1': 1}}),
                UpdateOne({'year': 2020, 'location_id': 1}, {'$set': {'values.2': 2}}),
                .... ,
                UpdateOne({'year': 2020, 'location_id': 1}, {'$set': {'values.11': 9}})],
               ordered = False)

This second option means I'm going to send millions of UpdateOne statements daily because I have several million documents that get updated every day and the array to be updated is much larger than in this simple example. I don't like that very much either.

Note that a related question is: Could this second option overload my server or is it not taking more resources than the first option anyway? The first option would mean a bulkwrite with, say 300'000 updateOne statements, each modifying a 365 long array, while the 2nd option would mean 300'000*364 updateOne statements, each modifying a single array element.

I would like to be able to do something like:

UpdateOne({'year': 2020, 'location_id': 1}, {'$set': {'values.1-end': [1 2 5.01 2 4 0 87 1 0.2 2 9]}})

i.e., specify a range of array positions to be replaced by a provided array. So far, I couldn't find how to do that. This would lead to a bulkwrite with 300'000 updateOne statements, each replace exactly those 364 values that need to be replaced.

NOTE: In this example it's replacing "all except the first value", but it may also be "all except the last value". Possibly also "replace elements at positions 4 to 9 by this array of length 6". I'm writing my code in python.


Solution

  • You can use an aggregation pipeline and use this for update:

    db.collection.updateOne(
       {'year': 2020, 'location_id': 1},
       [{
          $set: {
             values: {
                $concatArrays: [
                   [{ $arrayElemAt: ["$values", 0] }],
                   [1, 2, 5.1, 2, 4, 0, 87, 1, 0.2, 2, 9]
                ]
             }
          }
       }]
    )
    

    For operations like "replace elements at positions 4 to 9 by this array of length 6" you may also use $slice

    Also operator $range may help, see How to modify value in array by position at aggregation framework