Search code examples
javascriptnode.jsmongodbaggregation-frameworkmongoskin

What's wrong with this mongoskin aggregation command?


I have a mongodb/mongoskin aggregation request as follows:

db.collection(customerTable + '_earnedinfluencers').aggregate([
    {
        $group: {
            _id: '$user',
            name: '', // to be filled separately
            username: '', // to be filled separately 
            picture: '', // to be filled separately
            city: '', // to be filled separately
            kids: { $sum: '$kids' },
            revenue: { $sum: '$dayIncome' },
            kidsRevRatio: { $divide: [ { $sum: '$kids' }, { $sum: '$dayIncome' } ] }
        }, 

        $match: {
            richness: { $gte: variable1 },
            kids: { $lt: variable2 },
            hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
            event: { $in: schoolfestival },
            event: { $ne: 0 }
        },

        $project: {
            _id: 0,
            user: '$_id',
            name: 1,
            username: 1,
            picture: 1,
            city: 1,
            kids: 1,
            revenue: 1,
            kidsRevRatio: 1
        }
    }
], function(err, result) {
    // do something with err and result
});

The above code gives the following error:

Error: {"name":"MongoError","errmsg":"exception: A pipeline stage specification object must contain exactly one field.","code":16435,"ok":0}

I'm new to mongo and db in general, and can't tell what I did wrong.


Solution

  • Your pipeline arguments are unbalanced, each stage is a separate document so you need to wrap each one. But there are also a number of other problems

    db.collection(customerTable + '_earnedinfluencers').aggregate([
    
        { $match: {
                richness: { $gte: variable1 },
                kids: { $lt: variable2 },
                hobbies: { $in: ['hobby1', 'hobby2', 'hobby3', 'hobby4'] },
                event: { $in: schoolfestival },
        }},
    
        { $group: {
                _id: '$user',
                name: { '$first': '$name' },
                username: { '$first': '$username' },
                picture: { '$first': '$picture' },
                city: { '$first': '$city' }
                kids: { '$sum': '$kids' },
                revenue: { '$sum': '$dayIncome' },
                kidsSum: { '$sum': '$kids' },
        }}, 
    
    
        { $project: {
                _id: 0,
                user: '$_id',
                name: 1,
                username: 1,
                picture: 1,
                city: 1,
                revenue: 1,
                kidsSum: 1,
                kidsRevRatio: { $divide: [ '$kidsSum', '$revenue' ] }
        }}
    ], function(err, result) {
        // do something with err and result
    });
    

    You had the whole pipeline as one document where it in fact requires an array of documents as shown.

    But really you want to $match first in order to filter your results. If you really want to do some additional matching after a group then you add in an additional match to the pipeline afterwards.

    The $group operation requires that all fields outside of the _id grouping key need to have a "grouping operator" you cannot just get fields back by their own unless they are part of the _id you group on. Typically you want an operator such as $first or otherwise omit them completely.

    Top level grouping operators only, so operations like $divide are not a grouping operator. In order to to do this sort of thing when you are working with one or more $sum results you move this to a later $project using the fields with the calculated values.

    Also operations like project and group, "removes" fields from the pipeline and only preserves those you specifically include. So you cannot specify a field that is not there. This is one reason why the $match comes first as well as that is where you can use an index, and only ever at the start of the pipeline can this be done.

    But for each stage the only fields present will be those mentioned. As a further optimization the "optimizer" will from the start not include any fields present in your document that are not specifically mentioned. So only those referenced in the first match and group stages combined will be included for the rest of the pipline stages, and then possibly filtered down again.