Sample documents of a collection
DB: MYDB
Collection: MYCOLL
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622"
}
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=h242;AF=9y87;C=8w622"
}
I would like to have expected output as following:
{
"_id": {
"$oid": "678a78375cb6955814197272"
},
"ID": 1019397,
"INFO": "A=5242;AF=987;C=82622",
"A":"5242",
"AF":"987",
"C":"82622"
}
I have tried to do it myself, but for now I created the following aggregate code for a specific value in CSV. which is not what I want to do. Can you please tell me how can I make it better in performance and create (updating the current collection document) in the form as I showed above?
db["MYCOLL"].aggregate([
{
$project: {
INFO_ARR: {
$split:["$INFO",";"]
}
}
},
{
$project: {
AF_ARR: {
$arrayElemAt: [ "$INFO_ARR" , 1 ] }
}
},
{
$project: {
AF_FREQ_ARR: {
$split: [ "$AF_ARR" , "=" ]}
}
},
{
$project: {
AF: {
$arrayElemAt: [ "$AF_FREQ_ARR" , 1 ] }
}
},
{ $merge : { into: { db: "MYDB", coll: "MYCOLL" }, on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
Note: I can not use javascript as the collection is of size 400GB and is having millions of records. Hence, Javascript may be too slow to update all of the records. Am I wrong about this ?
You can use the following aggregation pipeline to split the contents of the INFO
property and add the values as properties:
[
{
$project: {
INFO: 1
}
},
{
$set: {
info_arr: {
"$split": [
"$INFO",
";"
]
}
}
},
{
$set: {
info_arr2: {
"$map": {
"input": "$info_arr",
"in": {
k: {
$first: {
$split: [
"$$this",
"="
]
}
},
v: {
$last: {
$split: [
"$$this",
"="
]
}
}
}
}
}
}
},
{
$set: {
info_obj: {
"$arrayToObject": "$info_arr2"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$$ROOT",
"$info_obj"
]
}
}
},
{
$unset: [
"info_arr",
"info_arr2",
"info_obj"
]
}
]
k
field, the value to a v
field.$arrayToObject
operator is used to create an object from the array contents.$$ROOT
document.$unset
stage.At the end, the documents look like this:
{
"A": "5242",
"AF": "987",
"C": "82622",
"INFO": "A=5242;AF=987;C=82622",
"_id": ObjectId("5a934e000102030405000001")
}
If you add a $merge
stage, you can add the new properties to existing documents in the collection.
See this playground to test.