Through this code I get very long performance (over 24h) working on large lists (~150M list elements inside, with 4 string elements each). I need to delete around 66M tuples from it:
def erase_random_elements(elements_list, iterable_random_numbers):
for i in sorted(iterable_random_numbers, reverse = True):
elements_list.pop(i)
return elements_list
It seems I've got enough RAM for it, so I don't need to chunk the list. How can I do it faster?
Don't use list.pop(i)
as it takes O(N)
time for each delete operation (the Delete Item on the linked page). Since you have enough memory you can create a new list:
def erase_random_elements(elements_list, iterable_random_numbers):
s = set(iterable_random_numbers)
elements_list[:] = [x for i, x in enumerate(elements_list) if i not in s]
#return elements_list -> not required actually because we're updating the list in-place
Note that here I used elements_list[:]
because in your original function you're updating the actual list instead of creating a new one and that means all the references to that list will be intact. But if that is not required then you can simple remove the [:]
, un-comment the return line and assign the returned list from the function to a variable.