Search code examples
pythonin-operator

In Python, how is the in operator implemented to work? Does it use the next() method of the iterators?


In Python, it is known that in checks for membership in iterators (lists, dictionaries, etc) and looks for substrings in strings. My question is regarding how in is implemented to achieve all of the following: 1) test for membership, 2) test for substrings and 3) access to the next element in a for-loop. For example, when for i in myList: or if i in myList: is executed, does in call myList.__next__()? If it does call it, how then does it work with strings, given that str objects are not iterators(as checked in Python 2.7) and so do not have the next() method? If a detailed discussion of in's implementation is not possible, would appreciate if a gist of it is supplied here.


Solution

  • A class can define how the in operator works on instances of that class by defining a __contains__ method.

    The Python data model documentation says:

    For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.

    Section 6.10.2, "Membership test operations", of the Python language reference has this to say:

    The operators in and not in test for membership. x in s evaluates to True if x is a member of s, and False otherwise. x not in s returns the negation of x in s. All built-in sequences and set types support this as well as dictionary, for which in tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

    For the string and bytes types, x in y is True if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

    For user-defined classes which define the __contains__() method, x in y returns True if y.__contains__(x) returns a true value, and False otherwise.

    For user-defined classes which do not define __contains__() but do define __iter__(), x in y is True if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.

    Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is True if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

    The operator not in is defined to have the inverse true value of in.

    As a comment indicates above, the expression operator in is distinct from the keyword in which forms a part of the for statement. In the Python grammar, the in is "hardcoded" as a part of the syntax of for:

    for_stmt ::=  "for" target_list "in" expression_list ":" suite
                  ["else" ":" suite]
    

    So in the context of a for statement, in doesn't behave as an operator, it's simply a syntactic marker to separate the target_list from the expression_list.