I have a mongodb/mongoskin aggregation request as follows:
db.collection(customerTable + '_earnedinfluencers').aggregate([
{
$group: {
_id: '$user',
name: '', // to be filled separately
username: '', // to be filled separately
picture: '', // to be filled separately
city: '', // to be filled separately
kids: { $sum: '$kids' },
revenue: { $sum: '$dayIncome' },
kidsRevRatio: { $divide: [ { $sum: '$kids' }, { $sum: '$dayIncome' } ] }
},
$match: {
richness: { $gte: variable1 },
kids: { $lt: variable2 },
hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
event: { $in: schoolfestival },
event: { $ne: 0 }
},
$project: {
_id: 0,
user: '$_id',
name: 1,
username: 1,
picture: 1,
city: 1,
kids: 1,
revenue: 1,
kidsRevRatio: 1
}
}
], function(err, result) {
// do something with err and result
});
The above code gives the following error:
Error: {"name":"MongoError","errmsg":"exception: A pipeline stage specification object must contain exactly one field.","code":16435,"ok":0}
I'm new to mongo and db in general, and can't tell what I did wrong.
Your pipeline arguments are unbalanced, each stage is a separate document so you need to wrap each one. But there are also a number of other problems
db.collection(customerTable + '_earnedinfluencers').aggregate([
{ $match: {
richness: { $gte: variable1 },
kids: { $lt: variable2 },
hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
event: { $in: schoolfestival },
}},
{ $group: {
_id: '$user',
name: { '$first': '$name' },
username: { '$first': '$username' },
picture: { '$first': '$picture' },
city: { '$first': '$city' }
kids: { '$sum': '$kids' },
revenue: { '$sum': '$dayIncome' },
kidsSum: { '$sum': '$kids' },
}},
{ $project: {
_id: 0,
user: '$_id',
name: 1,
username: 1,
picture: 1,
city: 1,
revenue: 1,
kidsSum: 1,
kidsRevRatio: { $divide: [ '$kidsSum', '$revenue' ] }
}}
], function(err, result) {
// do something with err and result
});
You had the whole pipeline as one document where it in fact requires an array of documents as shown.
But really you want to $match
first in order to filter your results. If you really want to do some additional matching after a group then you add in an additional match to the pipeline afterwards.
The $group
operation requires that all fields outside of the _id
grouping key need to have a "grouping operator" you cannot just get fields back by their own unless they are part of the _id
you group on. Typically you want an operator such as $first
or otherwise omit them completely.
Top level grouping operators only, so operations like $divide
are not a grouping operator. In order to to do this sort of thing when you are working with one or more $sum
results you move this to a later $project
using the fields with the calculated values.
Also operations like project and group, "removes" fields from the pipeline and only preserves those you specifically include. So you cannot specify a field that is not there. This is one reason why the $match
comes first as well as that is where you can use an index, and only ever at the start of the pipeline can this be done.
But for each stage the only fields present will be those mentioned. As a further optimization the "optimizer" will from the start not include any fields present in your document that are not specifically mentioned. So only those referenced in the first match and group stages combined will be included for the rest of the pipline stages, and then possibly filtered down again.