Search code examples
node.jsmongodbmongooseindexingaggregation

MongoDB: $in in aggregation is slow despite the index


I have a collection of users (~8k documents) and a collection of jobs (~150k documents). I need to create an aggregation with a stage that finds jobs each user was accepted to (accepted in job model is an array of ObjectIds):

db.getCollection('users')
    .aggregate([
        {
            $lookup: {
                from: 'jobs',
                as: 'jobs',
                let: { accepted: '$_id' },
                pipeline: [
                    {
                        $match: {
                            $expr: {
                                $and: [
                                    { $gte: ['$timestamp', 1643223600] },
                                    { $in: ['$$accepted', '$accepted'] }
                                ]
                            }
                        },
                    },
                ]
            }
        },
    ])

This aggregation is extremely slow. Even limiting the number of output records to 10 takes ~1 minute to accomplish. Although, the number of jobs with timestamp greater than 1643223600 is not so big - it's around 3000 documents. I added the following indexes:

{timestamp: -1},
{accepted: 1}

but it does not help. However, if I create a copy of the database and remove all the shifts which don't satisfy the timestamp condition, the aggregation works much faster (several seconds). Does it mean that the index doesn't work for some reason? If a run a simple query on jobs collection with explain with a condition on accepted field, it shows that IXSCAN was performed

UPD: since $in doesn't work with compound indexes, workaround for Mongodb4.4 is:

 {
            $lookup: {
                from: 'jobs',
                as: 'jobs',
                localField: '_id',
                foreignField: 'accepted'
            }
        },
        { $match: { $expr: { $ne: [{ $size: '$jobs' }, 0] } } },
        { $unwind: '$jobs' },
        {
            $match: { 'jobs.timestamp': { $gte: 1643223600 } }
        },

Solution

  • Query
    (requires MongoDB 5)

    users.aggregate(
    [{"$lookup":
      {"from":"jobs",
       "localField":"_id",
       "foreignField":"accepted",
       "pipeline":[{"$match":{"timestamp":{"$gte":1643223600}}}],
       "as":"jobs"}}])
    

    Try this after creating a multikey index on accepted, and if needed on timestamp also.

    Aggregate $in can't use an index even in Mongodb 5, so its not used.
    See here about $expr and index use.