I have a mongodb that has data as
{
"_id": "a",
"reply": "<",
"criterion": "story"
},
{
"_id": "b",
"reply": "<",
"criterion": "story"
},
{
"_id": "c",
"reply": ">",
"criterion": "story"
}
And I want the result as:
{
"criterion": "story",
"result" : {
">" : 1,
"<" : 2
}
}
I want to aggregate on "criterion". So if I do that there will be 1 document. However, I want to count the number of "<" and ">" and write that in the new key as shown in the json above. That is the logic behind this. Could anyone who has a good idea in mongodb help me with this?
You'd need to use the aggregation framework where you would run an aggregation pipeline that has a $group
operator pipeline stage which aggregates the documents to create the desired counts using the accumulator operator $sum
.
For the desired result, you would need to use a tenary operator like $cond
to create the independent count fields since that will feed the number of documents to the $sum
expression depending on the name value. The $cond
operator can be used effectively to evaluate the counts based on the reply
field value. It takes a logical condition as its first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). This converts the true/false
boolean evaluated returns into 1 and 0 that will feed into $sum
respectively:
"$cond": [
{ "$eq": ["$reply", ">"] },
1, 0
]
So, if within the document being processed the "$reply"
field has a ">"
value, the $cond
operator feeds the value 1 to the $sum
else it sums a zero value.
Use the $project
as your final pipeline step as it allows you to reshape each document in the stream, include, exclude or rename fields, inject computed fields, create sub-document fields, using mathematical expressions, dates, strings and/or logical (comparison, boolean, control) expressions. It is similar to SELECT
in SQL.
The following pipeline should return the desired result:
Model.aggregate([
{
"$group": {
"_id": "$criterion",
">": {
"$sum": {
"$cond": [
{ "$eq": [ "$reply", ">" ] },
1, 0
]
}
},
"<": {
"$sum": {
"$cond": [
{ "$eq": [ "$reply", "<" ] },
1, 0
]
}
}
}
},
{
"$project": {
"_id": 0,
"criterion": "$_id",
"result.>": "$>",
"result.<": "$<"
}
}
]).exec(function(err, result) {
console.log(JSON.stringify(result, null, 4));
});
Sample Console Output
{
"criterion" : "story",
"result" : {
">" : 1,
"<" : 2
}
}
Note: This approach takes into consideration the values for the $reply
field are fixed and known hence it's not flexible where the values are dynamic and unknown.
For a more flexible alternative which executes much faster than the above, has better performance and also takes into consideration unknown values for the count fields, I would suggest running the pipeline as follows:
Model.aggregate([
{
"$group": {
"_id": {
"criterion": "$criterion",
"reply": "$reply"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.criterion",
"result": {
"$push": {
"reply": "$_id.reply",
"count": "$count"
}
}
}
}
]).exec(function(err, result) {
console.log(JSON.stringify(result, null, 4));
});
Sample Console Output
{
"_id" : "story",
"result" : [
{
"reply" : "<",
"count" : 2
},
{
"reply" : ">",
"count" : 1
}
]
}