Search code examples
pythonstringwhitespacestripcpython

Is stripping string by '\r\n ' necessary in Python?


In Java, it's necessary to strip with \r\n, e.g. split( "\r\n") is not splitting my string in java

But is \r\n necessary in Python? Is the following true?

str.strip() == str.strip('\r\n ')

From the docs:

Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped

From this CPython test, str.strip() seems to be stripping:

 \t\n\r\f\v

Anyone can point me to the code in CPython that does the string stripping?


Solution

  • Are you looking for these lines?

    https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Objects/unicodeobject.c#L12222-L12247

    #define LEFTSTRIP 0
    #define RIGHTSTRIP 1
    #define BOTHSTRIP 2
    
    /* Arrays indexed by above */
    static const char *stripfuncnames[] = {"lstrip", "rstrip", "strip"};
    
    #define STRIPNAME(i) (stripfuncnames[i])
    
    /* externally visible for str.strip(unicode) */
    PyObject *
    _PyUnicode_XStrip(PyObject *self, int striptype, PyObject *sepobj)
    {
        void *data;
        int kind;
        Py_ssize_t i, j, len;
        BLOOM_MASK sepmask;
        Py_ssize_t seplen;
    
        if (PyUnicode_READY(self) == -1 || PyUnicode_READY(sepobj) == -1)
            return NULL;
    
        kind = PyUnicode_KIND(self);
        data = PyUnicode_DATA(self);
        len = PyUnicode_GET_LENGTH(self);
        seplen = PyUnicode_GET_LENGTH(sepobj);
        sepmask = make_bloom_mask(PyUnicode_KIND(sepobj),
                                  PyUnicode_DATA(sepobj),
                                  seplen);
    
        i = 0;
        if (striptype != RIGHTSTRIP) {
            while (i < len) {
                Py_UCS4 ch = PyUnicode_READ(kind, data, i);
                if (!BLOOM(sepmask, ch))
                    break;
                if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
                    break;
                i++;
            }
        }
    
        j = len;
        if (striptype != LEFTSTRIP) {
            j--;
            while (j >= i) {
                Py_UCS4 ch = PyUnicode_READ(kind, data, j);
                if (!BLOOM(sepmask, ch))
                    break;
                if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
                    break;
                j--;
            }
    
            j++;
        }
    
        return PyUnicode_Substring(self, i, j);
    }