Search code examples
pythonpython-3.xalgorithmhashtable

Optimal check if the elements of a list are in another list in python


I need to check if the items in one list are in another list. Both lists contain paths to files.

    list1 = [a/b/c/file1.txt, b/c/d/file2.txt]
    list2 = [a/b/c/file1.txt, b/c/d/file2.txt, d/f/g/test4.txt, d/k/test5.txt]

I tried something like:

    len1 = len(list1)
    len2 = len(list2)

    res = list(set(list2) - set(list1))
    len3 = len(res)

    if len2 - len1 == len3:
        print("List2 contains all the items in list1")

But it's not an optimal option, I have lists of 50k+ items. I think a good solution can be by creating a hash table, but I don't know exactly how I could build it. If you have any suggestions you can leave a message.


Solution

  • Python sets are based on hashing, hence you cannot put unhashable objects inside sets. Rather calculating lengths, directly perform set difference:

    >>> list1 = ['a/b/c/file1.txt', 'b/c/d/file2.txt']
    >>> list2 = ['a/b/c/file1.txt', 'b/c/d/file2.txt', 'd/f/g/test4.txt', 'd/k/test5.txt']
    >>> if (set(list1) - set(list2)):  # will return empty set (Falsy) if all are contained
            print("List2 contains all the items in list1")
    
    List2 contains all the items in list1
    

    Here is the breakdown:

    >>> difference = set(list1) - set(list2)
    >>> difference
    set()
    >>> bool(difference)
    False