Search code examples
graph-databasesmemgraphdb

How much memory do I need to to work with graph that has around one million nodes and four million relationships?


I plan to run Memgraph platform via Docker. I can see the memory footprint of the empty Memgraph. I need to import (I have data in CSV format) about one million nodes and four million relationships.

How much memory will I need to store and work with that amount of data?


Solution

  • Memgraph actually provided a formula in their documentation by which you can estimate the memory usage (although they use vertices for the nodes, and edges for relationships, probably because of their source code):

    StorageRAMUsage = NumberOfNodes × 260B + NumberOfRelationships × 180B

    In your case then:

    StorageRAMUsage = 4 000 000 x 260B + 1 000 000 x 180B

    StorageRAMUsage = 1 040 000 000B + 180 000 000B = 1 220 000 000B = ~1 191 406KB = ~1 1163MB = 1,14GB

    So I guess you won't be good with 1 GB of RAM, but 2 GB should be fine... This is just a rough estimation because if you have properties on nodes and relationships, as well as indices things can become a bit complicated.

    Each property is at least 2B when it comes to a simple boolean value, but can quickly grow if the property is a list of large strings (each large string takes up at least 10B).

    For every index on a property, you will need more memory than with the index on a node or relationship label, because the size of the property also needs to be taken into account.

    Also, each node and relationship object has a pointer to a Delta object which stores all changes on a certain node or relationship, so the more changes you have the more deltas you have, and the more memory you need.

    So there are more things to take into consideration than the mere number of nodes and relationships, but the abovementioned formula accurately predicted the memory I would require to import and query my datasets, although I feel it's important to point out that the datasets did not have a lot of complicated properties. So here's the formula again:

    StorageRAMUsage = NumberOfNodes × 260B + NumberOfRelationships × 180B