Search code examples
pythoniterationcomplexity-theory

How efficient is Python's 'in' or 'not in' operators, for large lists?


I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).

I am doing this by using if item[x] in randomList. How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?

Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?


Solution

  • in is implemented by the __contains__ magic method of the object it applies to, so the efficiency is dependent upon that. For instance, set, dict and frozenset will be hash based lookups, while list will require a linear search. However, xrange (or range in Python 3.x) has a __contains__ method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: 7 in xrange(4, 1000000) isn't done linearly).

    Custom classes are free to implement __contains__ however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".