What offers better performance for large datasets? Nested dictionaries or a dictionary of objects?

I find myself repeating this pattern when I am fetching from multiple database tables:

records = {'p_key': { "record": r, "A": list(), "B": list(), "C" : list() } for r in db_records}

I often have to group data this way because I cannot do joins across databases or there might be a situation where multiple queries is faster than multiple joins.

But performance-wise I am not sure if there is a lot of overhead to nesting dictionaries like this, and if I would be better served by creating an object with these attributes that becomes the value in the records dictionary. By performance I mean the overall cost in space and time when using a large set of nested dictionaries vs a dictionary of objects.

Solution

There's basically no difference in performance between dictionaries and regular class objects because internally objects are using dictionaries to handle their attributes.

However, you should consider using classes with __slots__. Here is detailed explanation about what it is and its performance.

Another option is using pandas library to work with big dataset.