I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).
I am doing this by using if item[x] in randomList
.
How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?
Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?
in
is implemented by the __contains__
magic method of the object it applies to, so the efficiency is dependent upon that. For instance, set
, dict
and frozenset
will be hash based lookups, while list
will require a linear search. However, xrange
(or range
in Python 3.x) has a __contains__
method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: 7 in xrange(4, 1000000)
isn't done linearly).
Custom classes are free to implement __contains__
however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".