Search code examples
pythonperformancereferencepass-by-value

Python: Replace a list in place from where it is referenced, not create a new reference/list


I am working with quite big number of values in Python (memory footprint is 5GB).

Sometimes, I need to access values by key, sometimes I need to loop values. For performance reasons, I am converting the Dict to a List at startup, so I can:

  • use the Dict in cases where I want to access values by key
  • use the List in cases where I want to loop values
my_big_dict_of_values
my_big_values_list = list(my_big_dict_of_values.values())

Here's a performance comparison, just for clarity:

>some_dict = dict(zip(range(1000000), reversed(range(1000000))))
>some_list = list(some_dict.values())
>%timeit for t in some_dict.values(): t 
21.1 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>%timeit for t in some_list: t 
16.1 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

My problem arises when I need to delete keys from the dict based on user input. First, I delete entries from the dict with:

for key in keys_to_remove:
    del(my_big_dict_of_values[key])

After this operation, I also want to update my_big_values_list. I can do this with:

Solution A (Slow)

indexes_to_remove = list()
for idx, value in enumerate(my_big_values_list):
    if value.key in keys_to_remove:
        indexes_to_remove.append(idx)
for index in sorted(indexes_to_remove, reverse=True):
    del my_big_values_list[index]

However, this is really slow and cumbersome.

Ideally, I would like to just create the list from the dict again with:

Solution B (Fast with reference issue)

my_big_values_list = list(my_big_dict_of_values.values())

This is fast, but it seems to create a new reference. I would need to replace all references of my_big_values_list passed to other classes/functions, which appears strange, e.g. to illustrate.

my_big_dict_of_values
my_big_values_list = list(
    my_big_dict_of_values.values())

handle_process = handle_process_class(
    my_big_dict_of_values, my_big_values_list)

userinput = userinput(handle_process)

handle_process.calculate()

def userinput_class():
    def __init__(handle_process):
        self.handle_process = handle_process
    def user_del_key(key):
        del(self.handle_process.my_big_dict_of_values[key])
        # Update list here too:
        # Solution A works
        # Solution B throws error in
        # handle_process.calculate() because
        # handle_process still has old list

def handle_process_class():
    def __init__(my_big_dict_of_values, my_big_values_list):
        self.my_big_dict_of_values = my_big_dict_of_values
        self.my_big_values_list = my_big_values_list
    def calculate(self):
        return len(self.my_big_values_list)

Is there a way to modify my_big_values_list in place, but simply replace with a new list (e.g. list(my_big_dict_of_values.values())).

I have read how Python passes references to values, and I think I understand most of it. That's why I came up with solution A, but I don't know how to use Solution B to modify the referenced list. Perhaps someone can explain what is going on here?


Solution

  • To modify the list in-place, assign to its slice:

    my_big_values_list[:] = list(my_big_dict_of_values.values())
    

    Example:

    >>> my_big_dict_of_values = {"a": 1, "b": 2, "c": 3}
    >>> my_big_values_list = list(my_big_dict_of_values.values())
    >>> another_list_reference = my_big_values_list
    
    >>> print(my_big_values_list, another_list_reference)
    [1, 2, 3] [1, 2, 3]
    
    >>> del(my_big_dict_of_values["b"])
    >>> my_big_values_list[:] = list(my_big_dict_of_values.values())
    
    >>> print(my_big_values_list, another_list_reference)
    [1, 3] [1, 3]
    

    However in terms of performance and memory usage, you should consider if a separate huge list is really necessary, as you can loop directly over dictionary.values().