Search code examples
pythoninitializationmutable

Generate a list of distinct empty mutables


I need to initialize a list of defaultdicts. If they were, say, strings, this would be tidy:

list_of_dds = [string] * n

…but for mutables, you get right into a mess with that approach:

>>> x=[defaultdict(list)] * 3
>>> x[0]['foo'] = 'bar'
>>> x
[defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'})]

What I do want is an iterable of freshly-minted distinct instances of defaultdicts. I can do this:

list_of_dds = [defaultdict(list) for i in xrange(n)]

but I feel a little dirty using a list comprehension here. I think there's a better approach. Is there? Please tell me what it is.

Edit:

This is why I feel the list comprehension is suboptimal. I'm not usually the pre-optimization type, but I can't bring myself to ignore the speed difference here:

>>> timeit('x=[string.letters]*100', setup='import string')
0.9318461418151855
>>> timeit('x=[string.letters for i in xrange(100)]', setup='import string')
12.606678009033203
>>> timeit('x=[[]]*100')
0.890861988067627
>>> timeit('x=[[] for i in xrange(100)]')
9.716886043548584

Solution

  • Your approach using the list comprehension is correct. Why do you think it's dirty? What you want is a list of things whose length is defined by some base set. List comprehensions create lists based on some base set. What's wrong with using a list comprehension here?

    Edit: The speed difference is a direct consequence of what you are trying to do. [[]]*100 is faster, because it only has to create one list. Creating a new list each time is slower, yeah, but you have to expect it to be slower if you actually want 100 different lists.

    (It doesn't create a new string each time on your string examples, but it's still slower, because the list comprehension can't "know" ahead of time that all the elements are going to be the same, so it still has to reevaluate the expression every time. I don't know the internal details of the list comp, but it's possible there's also some list-resizing overhead because it doesn't necessarily know the size of the index iterable to start with, so it can't preallocate the list. In addition, note that some of the slowdown in your string example is due to looking up string.letters on every iteration. On my system using timeit.timeit('x=[letters for i in xrange(100)]', setup='from string import letters') instead --- looking up string.letters only once --- cuts the time by about 30%.)