Search code examples
pythonstringpython-internals

Why is True returned when checking if an empty string is in another?


My limited brain cannot understand why this happens:

>>> print '' in 'lolsome'
True

In PHP, a equivalent comparison returns false (and a warning):

var_dump(strpos('lolsome', ''));

Solution

  • From the documentation:

    For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

    From looking at your print call, you're using 2.x.

    To go deeper, look at the bytecode:

    >>> def answer():
    ...   '' in 'lolsome'
    
    >>> dis.dis(answer)
      2           0 LOAD_CONST               1 ('')
                  3 LOAD_CONST               2 ('lolsome')
                  6 COMPARE_OP               6 (in)
                  9 POP_TOP
                 10 LOAD_CONST               0 (None)
                 13 RETURN_VALUE
    

    COMPARE_OP is where we are doing our boolean operation and looking at the source code for in reveals where the comparison happens:

        TARGET(COMPARE_OP)
        {
            w = POP();
            v = TOP();
            if (PyInt_CheckExact(w) && PyInt_CheckExact(v)) {
                /* INLINE: cmp(int, int) */
                register long a, b;
                register int res;
                a = PyInt_AS_LONG(v);
                b = PyInt_AS_LONG(w);
                switch (oparg) {
                case PyCmp_LT: res = a <  b; break;
                case PyCmp_LE: res = a <= b; break;
                case PyCmp_EQ: res = a == b; break;
                case PyCmp_NE: res = a != b; break;
                case PyCmp_GT: res = a >  b; break;
                case PyCmp_GE: res = a >= b; break;
                case PyCmp_IS: res = v == w; break;
                case PyCmp_IS_NOT: res = v != w; break;
                default: goto slow_compare;
                }
                x = res ? Py_True : Py_False;
                Py_INCREF(x);
            }
            else {
              slow_compare:
                x = cmp_outcome(oparg, v, w);
            }
            Py_DECREF(v);
            Py_DECREF(w);
            SET_TOP(x);
            if (x == NULL) break;
            PREDICT(POP_JUMP_IF_FALSE);
            PREDICT(POP_JUMP_IF_TRUE);
            DISPATCH();
        }
    

    and where cmp_outcome is in the same file, it's easy to find our next clue:

    res = PySequence_Contains(w, v);
    

    which is in abstract.c:

    {
        Py_ssize_t result;
        if (PyType_HasFeature(seq->ob_type, Py_TPFLAGS_HAVE_SEQUENCE_IN)) {
            PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
            if (sqm != NULL && sqm->sq_contains != NULL)
                return (*sqm->sq_contains)(seq, ob);
        }
        result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
        return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
    }
    

    and to come up for air from the source, we find this next function in the documentation:

    objobjproc PySequenceMethods.sq_contains
    

    This function may be used by PySequence_Contains() and has the same signature. This slot may be left to NULL, in this case PySequence_Contains() simply traverses the sequence until it finds a match.

    and further down in the same documentation:

    int PySequence_Contains(PyObject *o, PyObject *value)
    

    Determine if o contains value. If an item in o is equal to value, return 1, otherwise return 0. On error, return -1. This is equivalent to the Python expression value in o.

    Where '' isn't null, the sequence 'lolsome' can be thought to contain it.