Search code examples
pythondictionarypython-datamodel

What offers better performance for large datasets? Nested dictionaries or a dictionary of objects?


I find myself repeating this pattern when I am fetching from multiple database tables:

records = {'p_key': { "record": r, "A": list(), "B": list(), "C" : list() } for r in db_records}

I often have to group data this way because I cannot do joins across databases or there might be a situation where multiple queries is faster than multiple joins.

But performance-wise I am not sure if there is a lot of overhead to nesting dictionaries like this, and if I would be better served by creating an object with these attributes that becomes the value in the records dictionary. By performance I mean the overall cost in space and time when using a large set of nested dictionaries vs a dictionary of objects.


Solution

  • There's basically no difference in performance between dictionaries and regular class objects because internally objects are using dictionaries to handle their attributes.

    However, you should consider using classes with __slots__. Here is detailed explanation about what it is and its performance.

    Another option is using pandas library to work with big dataset.