Search code examples
mongodbmongodb-queryaggregation-frameworkmongodb-indexes

how to prevent retrieval of data from disk in a $lookup?


In the $lookup stage of my aggregation pipeline, it needs to join based on _id, which is indexed on the joined collection. The purpose is to simply check whether there are any matches in joined collection. The actual data of the joined document(s) does not matter, and hence does not need to be retrieved at all whether on disk or RAM.

So, how do I write the $lookup to ensure that data is never retrieved from disk. Instead, a value of true can be returned if matching records were found?

EDITED:

Retrieving data from disk is expensive, hence the reason to avoid it.


Solution

  • In the $lookup, use a pipeline that starts with a $project stage that includes only the _id field.

    {$lookup: {
        from: "targetCollection",
        localField: "fieldName",
        foreignField: "_id",
        as: "matched",
        pipeline: [
            { $project: {_id: 1}},
            { $count: "count" }
        ]
    }}
    

    The query executor should realize that all of the data needed to satisfy that part of the query can be obtained from the index, and not load any documents.

    Note: this assumes you are using MongoDB 5.0+