Aggregate $lookup so slow

1st stage: $Match

{
  delFlg: false
}

2nd stage: $lookup

{
  from: "parkings",
  localField: "parking_id",
  foreignField: "_id",
  as: "parking",
}

Using 2 IDX (ID & parking_ID). In MongoDB /Linux/ Compass explain plan almost 9000ms. In Local MongoDB server explain plan 1400ms.

Is there any problem? Why this our production server running so slow?

Full Explain Plan

Explain Plan
Query Performance Summary
Documents returned:
71184
Actual query execution time (ms):
8763
Query used the following indexes:
2

_id_

delFlg_1
explainVersion
"1"

stages
Array

serverInfo
Object

serverParameters
Object

command
Object
ok
1

$clusterTime
Object
operationTime
Timestamp({ t: 1666074389, i: 3 })

Using MongoDB 5.0.5 Version Local MongoDB 6.0.2 Version

Solution

Unfortunately the explain output that was provided is mostly truncated (eg Array and Object hiding most of the important information). Even without that there should be some meaningful observations that we can make.

As always I suggest starting with more clearly defining what your ultimate goals are. What is considered "so slow" and how quickly would the operation need to return in order to be acceptable for your use case? As outlined below, I doubt there is much particular room for improvement with the current schema. You may need to rethink the approach entirely if you are looking for improvements that are several order of magnitude in nature.

Let's consider what this operation is doing:

The database goes to an entry in the { delFlg: 1 } index that has a value of false.
It then goes to FETCH that full document from the collection.
Based on the value of the parking_id field, it performs an IXSCAN of the { _id: 1 } index on the parkings collection. If a matching index entry is discovered then the database will FETCH the full document from the other collection and add it to the new parking array field that is being generated.
This full process then repeats 71,183 more times.

Using 8763ms as the duration, that means that the database performed each full iteration of the work described above more than 8 times per millisecond (8.12 ~= 71184 (iters) / 8763 (ms)). I think that sounds pretty reasonable and is unlikely to be something that you can meaningfully improve on. Scaling up the hardware that the database is running on may provide some incremental improvements, but it is generally costly, not scalable, and likely not worthwhile if you are looking for more substantial improvements.

You also mentioned two different timings and database versions in the question. If I understood correctly, it was mentioned that the (explain) operation takes ~9 seconds on version 5.0 and about 1.4 seconds on version 6.0. While there is far too little information here to say for sure, one reason for that may be associated with improvements for $lookup that were introduced in version 6.0. Indeed from their 7 Big Reasons to Upgrade to MongoDB 6.0 article, they claim the following:

The performance of $lookup has also been upgraded. For instance, if there is an index on the foreign key and a small number of documents have been matched, $lookup can get results between 5 and 10 times faster than before.

This seems to match your situation and observations pretty directly.

Apart from upgrading, what else might improve performance here? Well, if many/most documents in the source collection have { delFlg: false } then you may wish to get rid of the { delFlg: 1 } index. Indexes provide benefits when the size of the results they are retrieving are small relative to the overall size of the data in the collection. But as the percentage grows, the overhead of scanning most of the index plus randomly fetching all of the data quickly becomes less effective than just scanning the full collection directly. It is mentioned in the comments that invoices collection contains 70k documents, so this predicate on delFlg doesn't seem to remove hardly any results at all.

One other thing really stands out is this statement from the comments "parkings contains 16 documents". Should the information from those 16 documents just be moved into the parkings documents directly instead? If the two are commonly accessed together and if the parkings information doesn't change very often, then combining the two would reduce a lot of overhead and would further improve performance.

Two additional components about explain that are worth keeping in mind:

The command does not factor in network time to actually transmit the results to the client. Depending on the size of the documents and the physical topology of the environment, this network time could add another meaningful delay to the operation. Be sure that the client really needs all of the data that is being sent.
Depending on the verbosity, explain will measure the time it takes to run the operation through completion. Since there are no blocking stages in your aggregation pipeline, we would expect your initial batch of documents to be returned to the client in much less time. Apart from networking, that time may be around 12ms (which is approximately (101 docs / 71184 ) * 8763) for 5.0.