php mongodb doctrine doctrine-odm wiredtiger

MongoDB functional test setup and teardown 10x slower in 4.2 with WiredTiger

I am in the process of upgrading our MongoDBs from 3.4 (using MMAPv1 storage engine) to 4.2 (using WiredTiger). One thing I have encountered that is pretty much a blocker at this point is a serious slowdown of our tests.

Long story short (more details below) - MongoDB 4.2 WiredTiger is taking much longer to process repeated database setup/teardown in tests. The slowdown is in the ballpark of a factor of 10. The tests used to run about 10 minutes, with 4.2 they run almost 90 minutes. This slowdown reproduces even with just a fraction of tests and seems to come from the setup/teardown stage of the testing.

Environment

A few words about our environment -- we are using PHP with Doctrine ODM to talk to MongoDB. We have about 3000 tests, some pure unit tests, some (many) functional, actually using the database. The tests are running in a Dockerized environment - we spin up a fresh MongoDB Docker container for each pipeline, but I have confirmed that the same slowdown occurs even in a production-like baremetal setting. The experiments below were done on bare metal, to limit problems coming from somewhere else.

Each functional test first drops the database, then loads fixtures into it (+ creates indices) and then the actual test scenario is executed.

Profiling PHP

Running a small subset of the tests and measuring the timing, I get these results:

3.4:
    real    0m12.478s
    user    0m7.054s
    sys     0m2.247s

4.2:
    real    0m56.669s
    user    0m7.488s
    sys     0m2.334s

As you can see, the actual CPU time taken by the tests is about the same, no significant difference there. The real time is very different though, which suggests a lot of waiting (for I/O in this case?).

I have further profiled the PHP code and I can see from the results that there is 9-10x increase in the time spent in this function:

MongoDB\Driver\Manager::executeWriteCommand()

The documentation for that function says:

This method will apply logic that is specific to commands that write (e.g. » drop)

That makes me think that the amount of setup/teardown (i.e. dropping collection, creating indexes) will be at play here.

Profiling MongoDB

Profiling PHP pointed at a slowdown in MongoDB so I profiled that as well. The subset of tests that I ran resulted in

1366 profiling documents for 3.4 MMAPv1
2092 profiling documents for 4.2 WiredTiger

Most of the disparity between those numbers can be attributed to the fact that in 4.2 there are no documents for createIndexes (maybe they were added to profiling post-3.4? I don't know).

I filtered the profiling documents to only show those which took at least 1 millisecond (>0). There were:

2 such documents for MongoDB 3.4 (two drop commands)
950+ such documents for MongoDB 4.2 (209x drop, 715x createIndexes, 4x insert, 23x query)

As I mentioned earlier, Mongo 3.4 does not seem to report createIndexes in the profiling. But let's assume all of those commands would take as long as they do in 4.2 (they will probably take shorter, based on the rest of the profiling results though).

Then there are all those drop commands that take up to 15 milliseconds per operation in 4.2. In 3.4 there are also 209 drop commands, but almost all of them are reported to have lasted 0 milliseconds.

There is only a minimal amount of inserting and querying and the size of the collections when those are happening is only a handful of documents (less than 10 per collection, less than 5 collections actually queried and inserted into). This slowdown is not a result of missing caches or indices. Even full scans would be fast in this setting.

Memory and hardware

Most of the discussion I have found regarding this have been around setting an appropriate cache size for the working sets. I ran the tests on a small server with a single core and 4GB RAM with the default cache size (which should be 50% of available memory, i.e. 2GB). That is definitely large enough for all the data the tests could have created. They were truly trivial and most of the time spent on them was on setup/teardown of the database state.

Conclusion

This is the first time I have profiled our tests and their interaction with the database. The ratio of drop-and-index-creation to actual work can definitely be improved, but it has worked so far with MMAPv1 and MongoDB 3.4. Is this type of a slowdown something that is expected with WiredTiger? Is there something I can do to mitigate this?

I am now afraid of upgrading the production MongoDB instances because I don't know how those will behave. If this is related mostly to index creation and database dropping, then I suppose production workload should be fine, but I don't want to take chances. Sadly we are a fairly small company and do not have any performance/stress tests of the production environment.

Edits

Using `tmpfs`

Since I'm running the tests in Docker and Docker supports tmpfs volumes out-of-the-box, I gave that a try. When using RAM-backed tmpfs as the mount for the MongoDB data, I managed to cut down the test time to about half:

4.2:
    real    0m56.669s
    user    0m7.488s
    sys     0m2.334s

4.2 - tmpfs:
    real    0m30.951s
    user    0m7.697s
    sys     0m2.279s

This is better, but still a far cry from the 12 seconds it takes to run on MMAPv1. Interestingly, using tmpfs with MMAPv1 did not yield a significantly different result.

The real cause of test slowdown - indices

Turns out that our testing framework and fixture loader created indices for all managed collections with each database purge. This resulted in about 100 index creations per test case and this was what caused the slowdown. I did not find a concrete proof directly from Mongo but it seems that index creation with WiredTiger is significantly slower than with MMAPv1. Removing the index creation from the test setup code sped up the tests significantly, getting us back to the pre-upgrade times.

The vast majority of our tests do not need the indexes and their creation takes much longer than the speedup in queries they provide. I implemented an option to enforce index creation for test cases where the developer knows they will need them. That is an acceptable solution for us.

Solution

Put the database's data into memory. On Linux, I recommend zram.

In my experience zram is 2x as fast as top of the line nvme ssd (samsung 860 pro I think) in raid 0 and is I think almost 10x as fast as a single consumer-grade laptop ssd. The difference should be even bigger for spinning disk or storage accessed over network.

MongoDB has various other storage engines (there's one called "ephemeral for test" I believe) but they don't support transactions, so you need to use WT if your application makes use of 4.2 (or even 4.0 I think) functionality.

In production you are most likely not dropping collections every request so actual performance difference between 3.x and 4.2 should be smaller.