Search code examples
mongodbconnectionpymongoramgridfs

MongoDB RAM Consumption on Connections


I'm using pymongo to insert a big amount of jsons to MongoDB gridFS + some data to collection. What I noticed some time ago is that MongoDB consumes just crazy amount of RAM within using single connection. As soon as I close this connection it releases it. RAM consumption is like 10-12GB in total within connection and 200MB without. The actual size of collection is actually ~300MB with 10-18GB gridFS storage.

Why does it happen? How can opening new connection for any bulky operation can be lot less resource-dependent than using one single connection for everything? Is it somehow related to Journaling?


Solution

  • I will have to break down this problem into multiple smaller problems for ease of understanding:

    1. It is well known that MongoDB is RAM hungry, it will try to use as much RAM as possible.
    2. GridFS tends to store files in collection fs.chunks and corresponding meta-data in fs.files. The files stored in GridFS are split into chunks of 256KB each.

    When you read GridFS data by opening a connection, the chunks belonging to file(s) have to be loaded into the RAM from the disk(if it is not already present in RAM). So , RAM usage is directly proportional to the amount of data stored and importantly frequency of GridFS data access. Just to re-iterate GridFS data gets pulled into RAM if the query references it.

    If you have a active connection for large amounts of GridFS data then you should expect heavy RAM usage. But if your query frequency is low(just write, but read rarely) then RAM usage will be relatively lower.If you are mostly writing data, then ensure the connection is closed after the operation in done.

    1. The more the number of open connections, your RAM usage will increase.
    2. This is no-way related to journaling.

    Note: GridFS also supports sharding which will tend to solve your problem of excessive RAM usage.

    Hope this clarifies.