Search code examples
pythonlistdictionarydefaultdict

Python : defaultdict every value is updated


I'm working for more than one hour on a stupid problem but i can't figure out the solution. I create a defaultdict(list) with an initial list and update those list through a for loop. However, everytime i update one value, all other values are updated with same value. Can somebody help me please ? Here is my code :

from collections import defaultdict
base = ["coucou", "salut", "tchao"]
initial_vector = [0]*len(base)
dict_vectorized_documents = defaultdict(lambda: initial_vector)
inversed_index = {"coucou": [(1, 3), (100, 4)], "salut": [(1, 1), (99, 2), (33, 3)], "tchao": [(1, 5)]}

for i, word in enumerate(base):
print(word)
for element in inversed_index[word]:
    print(element[0])
    print(i)
    print(element[1])
    print(dict_vectorized_documents[element[0]][i])
    dict_vectorized_documents[element[0]][i] = element[1]
    print(dict_vectorized_documents)

print(dict_vectorized_documents)

And here is my logs when i run it :

coucou
1
0
3
0
defaultdict(<function <lambda> at 0x7fcc5fac1f28>, {1: [3, 0, 0]})
100
0
4
3
defaultdict(<function <lambda> at 0x7fcc5fac1f28>, {1: [4, 0, 0], 100:      [4, 0, 0]})
salut
1
1
1
0
defaultdict(<function <lambda> at 0x7fcc5fac1f28>, {1: [4, 1, 0], 100: [4, 1, 0]})
99
1
2
1
defaultdict(<function <lambda> at 0x7fcc5fac1f28>, {1: [4, 2, 0], 99: [4, 2, 0], 100: [4, 2, 0]})
33
1
3
2
defaultdict(<function <lambda> at 0x7fcc5fac1f28>, {1: [4, 3, 0], 99: [4, 3, 0], 100: [4, 3, 0], 33: [4, 3, 0]})
tchao
1
2
5
0

Thank you very much !


Solution

  • Because you are returning the same list in your defaultdict factory. The simplest solution? Explicitly copy it with list:

    >>> from collections import defaultdict
    >>> base = ["coucou", "salut", "tchao"]
    >>> initial_vector = [0]*len(base)
    >>> dict_vectorized_documents = defaultdict(lambda: list(initial_vector))
    

    Here is a contrived example that maybe makes it more clear:

    >>> initial_list = [0, 0, 0]
    >>> def get_initial():
    ...     return initial_list
    ...
    >>> d = {}
    >>> for k, i in zip(['key1','key2','key3'],range(3)):
    ...     new_list = get_initial()
    ...     new_list[i] = 'mutated'
    ...     d[k] = new_list
    ...
    >>> d
    {'key2': ['mutated', 'mutated', 'mutated'], 'key3': ['mutated', 'mutated', 'mutated'], 'key1': ['mutated', 'mutated', 'mutated']}
    

    So new_list was not a new list after all. However, if we do:

    >>> initial_list = [0, 0, 0]
    >>> def get_initial():
    ...     return list(initial_list)
    ...
    >>> d = {}
    >>> for k, i in zip(['key1','key2','key3'],range(3)):
    ...     new_list = get_initial()
    ...     new_list[i] = 'mutated'
    ...     d[k] = new_list
    ...
    >>> d
    {'key2': [0, 'mutated', 0], 'key3': [0, 0, 'mutated'], 'key1': ['mutated', 0, 0]}
    >>>