Exclude a field during group stage of mongo aggregate pipeline but include it back at the end

I have a mongo aggregate query like below:


db.someCollection.aggregate([
  {
    $match: { taskId: "qy7u17-xunwqu" }
  },
  // Group by "tracklet_id" and calculate count for each group
  { 
    $group: {
      _id: '$tracklet_id',
      count: { $sum: 1 },
      representativeImage: { $first: '$img' }, // when I remove this, the query is done in a split second
      timestamp: { $max: '$timestamp' },
    },
  },
  {
    $project: {
      _id: 0,
      trackletId: '$_id',
      image: '$representativeImage', // but at the end, I want one representative image for a tracklet, doesn't matter which one.
      timestamp: 1,
      count: 1,
    },
  },
  { 
    $sort: {
      timestamp: -1
    }
  },
  {
    $limit: 20
  },
  
], {allowDiskUse: true})

the image field contains long b64 strings that take a lot of memory, causing the group and sort stage of the query to move to disk. Is there a way that I can add a pipeline step before project to re-include an image field for each tracklet?

One alternative I can think of is to just do a separate query afterwards to get the images and combine the results, but I'm hoping there is a more elegant way to do it in the same aggregate query.

Solution

As verified by the OP.
Instead of having the image in the $group stage get the image from a self-lookup where you limit only to 1 doc (since we only need any single image and to save memory) that matches the same trackletid

db.someCollection.aggregate([
  { $match: { taskId: "qy7u17-xunwqu" } },
  {
    $group: {
      _id: "$tracklet_id",
      count: { $sum: 1 },
      timestamp: { $max: "$timestamp" }
    }
  },
  { $project: { _id: 0, trackletId: "$_id", timestamp: 1, count: 1 } },
  { $sort: { timestamp: -1 } },
  { $limit: 20 },
  {
    $lookup: {
      from: "someCollection",
      let: { trackletId: "$trackletId" },
      pipeline: [
        { $match: { $expr: { $eq: [ "$tracklet_id", "$$trackletId" ] } } },
        { $limit: 1 },
        { $project: { _id: 0, img: 1 } }
      ],
      as: "image"
    }
  },
  { $addFields: { image: { $arrayElemAt: [ "$image.img", 0 ] } } }
])

demo