Search code examples
mongodbmongodb-queryaggregation-framework

Most efficient mongo aggregation to unwind and filter: match unwind match vs project and filter


I have a collection of documents that each contain arrays. the output i want is the unwinded array elements filtered down by a certain criteria.

i can do this by doing:

db.collection.aggregate([
 {
   $unwind: "$myArrayField"
 },
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 }
])

but it seems wasteful to unwind everything, so instead i do

db.collection.aggregate([
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 },
 {
   $unwind: "$myArrayField"
 },
 {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 }
])

to filter down to viable document candidates, unwind those, and then filter down the unwound elements again since there will be elements that don't match.

alternatively, i can do

db.collection.aggregate([
 {
  $project: {
    myArrayField: {
      $filter: {
        input: "$myArrayField",
        as: "element",
        cond: { $eq: ["$$element.myCriteriaField", "myValue"] }
      }
    }
  }
 },
 {
  $unwind: "$myArrayField"
 }
])

or even

db.collection.aggregate([
  {
   $match: {
     "myArrayField.myCriteriaField": "myValue"
   }
 },
 {
  $project: {
    myArrayField: {
      $filter: {
        input: "$myArrayField",
        as: "element",
        cond: { $eq: ["$$element.myCriteriaField", "myValue"] }
      }
    }
  }
 },
 {
  $unwind: "$myArrayField"
 }
])

which one is the most efficient? i don't understand enough about how the aggregations are run to know what variables it could depend on (collection size, document size, etc).

lastly, the match unwind match seems pretty straightforward but it also just feels wrong to have to do it like that so i'm wondering if i'm missing something.


Solution

  • First, You must sure have an index on the field. Using Explain to check index hit with $match first stage.

    Now, we explain your 2 suggestion: $project $filter $unwind vs $match $project $filter $unwind:

    • $project $filter $unwind: with first stage $project, mongodb is doing a collection scan, outputting ALL documents in that collection with that field. Now, It's a full collection scan and slow

    • $match $project $filter $unwind: with first stage $match, mongodb is doing an index scan instead of a collection if It hit an index. After that, stage $project will do on a limit document that matched. Now, it a index scan and faster because mongodb only do on some document.

    Final, with your 2 option, the second is better