Search code examples
node.jsmongodbmongoosemapreduceaggregation-framework

MongoDB aggregation $group stage by already created values / variable from outside


Imaging I have an array of objects, available before the aggregate query:

const groupBy = [
  {
    realm: 1,
    latest_timestamp: 1318874398, //Date.now() values, usually different to each other
    item_id: 1234, //always the same
  },
  {
    realm: 2,
    latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
    item_id: 1234,
  },
  {
    realm: ..., //there are many of them
    latest_timestamp: ...,
    item_id: 1234,
  },
  {
    realm: 10,
    latest_timestamp: 1318874398, //but sometimes then can be the same
    item_id: 1234,
  },
]

And collection (example set available on MongoPlayground) with the following schema:

  {
    realm: Number,
    timestamp: Number,
    item_id: Number,
    field: Number, //any other useless fields in this case
  }

My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?

What have been tried already.

Okay, let skip crap ideas, like:

for (const element of groupBy) {
  //array of `find` queries
}

My current working aggregation query is something like that:

      //first stage
      {
         $match: { 
           "item": 1234
           "realm" [1,2,3,4...,10]
         }
      },
      {
        $group: {
          _id: {
            realm: '$realm',
          },
          latest_timestamp: {
            $max: '$timestamp',
          },
          data: {
            $push: '$$ROOT',
          },
        },
      },
      {
        $unwind: '$data',
      },
      {
        $addFields: {
          'data.latest_timestamp': {
            $cond: {
              if: {
                $eq: ['$data.timestamp', '$latest_timestamp'],
              },
              then: '$latest_timestamp',
              else: '$$REMOVE',
            },
          },
        },
      },
      {
        $replaceRoot: {
          newRoot: '$data',
        },
      },
      //At last, after this stages I can do useful job

but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?

As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.

Recent update:

I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.

For what it's worth:

After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap Heatmap

After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:

[
  { 
    x: x (number, actual $price), 
    y: y (number, actual $realm),
    value: price * quantity,
    quantity: sum_of_quantity_on_price_level
  }
]

As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.


Solution

  • So, I have found an answer to my question in another, but relevant way.

    I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.

    //building $facet query before aggregation
    
    const ObjectQuery = {}
    for (const realm of realms) {
      Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
    }
    
    //mongoose query here
    aggregation([{
        $facet: ObjectQuery
      },
      ...
    ])
    

    So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.

    Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.