Search code examples
mongodbquery-optimization

MongoDb $elemMatch with $lookup variable


I have a given MongoDb collection with documents which look similar to this:

{
items: [
  { id: 1, data: 'a' },
  { id: 2, data: 'b' },
  { id: 3, data: 'c' },
]
}

Now I have to $lookup from this collection. I need the array data from the element with id = 2 where id = 1 data matches the lookup value $$value. My first naive try to find the correct document looked like this:

$match {
  items: { $elemMatch: {
    id: 1,
    data: '$$value'
  } }
}

But the $$value is not evaluated, so the data is compared to the literal instead of its value. I also tried to evaluate the value using $expr, but I wasn't able to get the syntax right (if possible at all?).

The only working way I was able to get working was by extracting the data first and do the matching afterwards:

[
    {
        $project: {
            _id: 0,
            data1: {
                $filter: {
                    input: '$items', as: 'item',
                    cond: {$eq: ['$$item.id', 1]}
                }
            },
            data2: {
                $filter: {
                    input: '$items', as: 'item',
                    cond: {$eq: ['$$item.id', 2]}
                }
            },
        }
    },
    {
        $set: {
            data1: {$arrayElemAt: ['$data1.data', 0]},
            data2: {$arrayElemAt: ['$data2.data', 0]}
        }
    },
    {
        $match: {
            $expr: {$eq: ['$data1', '$$value']}
        }
    }
];

But as one might expect, this approach is much slower. For the data involved the query took over 7s, whereas the upper approach (using a constant instead of a variable $$value) over 3x as fast.

Is it possible to use the variable $$value directly in the $elemMatch operator? Or is there any other optimization available to speed-up the collection lookup?


Solution

  • So first let's understand why your naive approach failed, Mongo's pipelined lookup docs specify that:

    A $match stage requires the use of an $expr operator to access the variables. $expr allows the use of aggregation expressions inside of the $match syntax.

    So the variable value that you are defining in the start of the $lookup is only accessible with the usage of $expr, now also specified in the $expr docs:

    The arguments can be any valid aggregation expression

    Sadly for us $elemMatch is not a "aggregation expression" as it belongs to the "query language". this is why your first approach failed. You are just not allowed to use $elemMatch within an $expr which is required in order to access the $$value field.

    So what can we do?, well you can just use $filter as you already started doing:

    db.collection.aggregate([
      {
        $lookup: {
          from: "collection2",
          let: {
            value: "$data"
          },
          pipeline: [
            {
              $match: {
                $expr: {
                  $gt: [
                    {
                      $size: {
                        $filter: {
                          input: "$items",
                          as: "item",
                          cond: {
                            $and: [
                              {
                                $eq: [
                                  "$$item.id",
                                  1
                                ]
                              },
                              {
                                $eq: [
                                  "$$item.data",
                                  "$$value"
                                ]
                              }
                            ]
                          }
                        }
                      }
                    },
                    0
                  ],
                  
                }
              }
            }
          ],
          as: "res"
        }
      }
    ])
    

    It would be easier to answer fully if you provide the full structure of both collections and the required end result as I had to guess some of the stuff here in order to write this pipeline.

    Mongo Playground