Search code examples
mongodbmongodb-queryaggregate

Exclude a field during group stage of mongo aggregate pipeline but include it back at the end


I have a mongo aggregate query like below:


db.someCollection.aggregate([
  {
    $match: { taskId: "qy7u17-xunwqu" }
  },
  // Group by "tracklet_id" and calculate count for each group
  { 
    $group: {
      _id: '$tracklet_id',
      count: { $sum: 1 },
      representativeImage: { $first: '$img' }, // when I remove this, the query is done in a split second
      timestamp: { $max: '$timestamp' },
    },
  },
  {
    $project: {
      _id: 0,
      trackletId: '$_id',
      image: '$representativeImage', // but at the end, I want one representative image for a tracklet, doesn't matter which one.
      timestamp: 1,
      count: 1,
    },
  },
  { 
    $sort: {
      timestamp: -1
    }
  },
  {
    $limit: 20
  },
  
], {allowDiskUse: true})

the image field contains long b64 strings that take a lot of memory, causing the group and sort stage of the query to move to disk. Is there a way that I can add a pipeline step before project to re-include an image field for each tracklet?

One alternative I can think of is to just do a separate query afterwards to get the images and combine the results, but I'm hoping there is a more elegant way to do it in the same aggregate query.


Solution

  • As verified by the OP.
    Instead of having the image in the $group stage get the image from a self-lookup where you limit only to 1 doc (since we only need any single image and to save memory) that matches the same trackletid

    db.someCollection.aggregate([
      { $match: { taskId: "qy7u17-xunwqu" } },
      {
        $group: {
          _id: "$tracklet_id",
          count: { $sum: 1 },
          timestamp: { $max: "$timestamp" }
        }
      },
      { $project: { _id: 0, trackletId: "$_id", timestamp: 1, count: 1 } },
      { $sort: { timestamp: -1 } },
      { $limit: 20 },
      {
        $lookup: {
          from: "someCollection",
          let: { trackletId: "$trackletId" },
          pipeline: [
            { $match: { $expr: { $eq: [ "$tracklet_id", "$$trackletId" ] } } },
            { $limit: 1 },
            { $project: { _id: 0, img: 1 } }
          ],
          as: "image"
        }
      },
      { $addFields: { image: { $arrayElemAt: [ "$image.img", 0 ] } } }
    ])
    

    demo