Search code examples
pythonfilteringprocessing-efficiency

Python: Filtering large list by operations on list elements


I have a large list that I wish to filter. I want to do this by performing some operations on each element in the list and then delete any other matching elements in the list. The desired output is the shorter ls, with the matching post op elements removed. I can do it.. but it's really slow. Do you have advice to speed things up?

An example list looks like:

ls = [1,2,3,......,10000000]

and the operations look like:

def performOps(x):
    a = x**2
    b = x**5
    c = x**7
    return a,b,c

for elem in ls:
    res = performOps(elem)
    for i in res:
        if i in ls[ls.index(elem)+1:]:
            ls.remove(elem)

Solution

  • Your code is slow because of multiple calls to .index. Also, editing a list while you are also iterating over it is technically possible, but hard to debug.

    Here's an approach where we first build a set of numbers that you want to delete, and then delete them with a single filter call. The set of numbers is a set because, for large groups of numbers, it is much faster to test membership in a set than in a list:

    # Make a set a numbers that we need to remove
    toRemove = set()
    for elem in ls:
        res = performOps(elem)
        for i in res:
            toRemove.add(i)
    # Remove those numbers
    ls = list(filter(lambda x: x not in toRemove, ls))