Search code examples
arraysmongodbdictionarytransformationbulkupdate

How to transform part of each document into an array in each document using bulk read/write?


I have documents that look like this

{
    "field1" : "value",
    "field3" : "value",
    "attributes" : {
        "key1" : {
            "category" : "4",
            "value" : "value"
        },
        "key2" : {
            "category" : "5",
            "value" : "value"
        },
    }
}

I want to transform these to look like the structure below (basically it's taking a dictionary and flattening it into an array). Note the number of keys in attributes varies.

{
    "field1" : "value",
    "field3" : "value",
    "attributes" : [
        {
            "key" : "key1",
            "category" : "4",
            "value" : "value"
        },
        {
            "key" : "key2",
            "category" : "5",
            "value" : "value"
        },
    ]
}

Because the number of documents is significant, I would like to use the bulk APIs. I have this basic structure to my script.

db = db.getSiblingDB('MyDB');

db.MyCollection.intializeOrderedBukOp(),
var bulk = db.MyCollection.intializeOrderedBukOp(),
    count = 0;

db.MyCollection.find({}).forEach(function(doc) {
   // Re-map attrs (what follows is some pseudocode
   var attrs = function TransformSomehow(doc.attributes) {
           // return an array...
       };
   });

   // Queue update operation
   bulk.find({ "_id": doc._id })
       .updateOne({ 
           "$set": { "attrs": attrs } }
       });
   count++;

   // Execute every 1000
   if ( count % 1000 == 0 ) {
       bulk.execute();
       bulk = db.collection.intializeOrderedBukOp();
   }
});

// Drain any queued remaining
if ( count % 1000 != 0 )
    bulk.execute();

This was borrowed from this answer, but I can't use it as is because it's going the opposite direction (array to object). It appears that I can't use map to do this.


Solution

  • This is one way to do bulk write using the MongoDB NodeJS driver v3.6 and MongoDB v4.2.3. This uses a simple batch to update all the collection documents. The update transforms the object (or dictionary) into an array (as per your requirement) for each document.

    const collection = db.collection('test');
    
    const update1 = { 
        updateMany: { 
            filter: { }, 
            update: [ 
                { $set: { 
                     attributes: { 
                         $map: { 
                             input: { $objectToArray: "$attributes" }, 
                             in: { $mergeObjects: [ { "key": "$$this.k" }, "$$this.v" ] } 
                         } 
                     } 
                } } 
            ] 
        } 
    };
    
    collection.bulkWrite( [ update1 ],
                          { ordered: false },
                          ( err, result ) => {
                              console.log( 'Updated and modified count: ', result.matchedCount, result.modifiedCount );
                           }
    )
    


    Reference: CRUD Operations - Bulk Write