Search code examples
pythonparsingread-eval-print-loopcpythonlanguage-implementation

How CPython handles multiline input in REPL?


Python's REPL reads input line by line. However, function definitions consist from multiple lines.

For example:

>>> def answer():
...   return 42
...
>>> answer()
42

How does CPython's parser request additional input after partial def answer(): line?


Solution

  • TLDR: Digging into source code of CPython, I figured out that lazy lexer outputs >>> and ... promts.

    1. Entry point for REPL is pymain_repl function:
    static void
    pymain_repl(PyConfig *config, int *exitcode)
    {
        /* ... */
    
        PyCompilerFlags cf = _PyCompilerFlags_INIT;
        int res = PyRun_AnyFileFlags(stdin, "<stdin>", &cf); // <-
        *exitcode = (res != 0);
    }
    

    Which sets name of compiled file to "<stdin>".

    1. If the name of file is "<stdin>", then _PyRun_InteractiveLoopObject will be called. It's the REPL loop itself. Also, here >>> and ... are loaded to some global state.
    int
    _PyRun_InteractiveLoopObject(FILE *fp, PyObject *filename, PyCompilerFlags *flags)
    {
        /* ... */
        PyObject *v = _PySys_GetAttr(tstate, &_Py_ID(ps1));
        if (v == NULL) {
            _PySys_SetAttr(&_Py_ID(ps1), v = PyUnicode_FromString(">>> ")); // <-
            Py_XDECREF(v);
        }
        v = _PySys_GetAttr(tstate, &_Py_ID(ps2));
        if (v == NULL) {
            _PySys_SetAttr(&_Py_ID(ps2), v = PyUnicode_FromString("... ")); // <-
            Py_XDECREF(v);
        }
    
        /* ... */
    
        do {
            ret = PyRun_InteractiveOneObjectEx(fp, filename, flags); // <-
            /* ... */
        } while (ret != E_EOF);
        return err;
    }
    
    1. PyRun_InteractiveOneObjectEx reads, parses, compiles and runs single python's object
    static int
    PyRun_InteractiveOneObjectEx(FILE *fp, PyObject *filename,
                                 PyCompilerFlags *flags)
    {
        /* ... */
    
        v = _PySys_GetAttr(tstate, &_Py_ID(ps1)); // <- 
        /* ... (ps1 is set to v) */
        
        w = _PySys_GetAttr(tstate, &_Py_ID(ps2)); // <-
        /* ... (ps2 is set to w) */
    
        mod = _PyParser_ASTFromFile(fp, filename, enc, Py_single_input,
                                    ps1, ps2, flags, &errcode, arena);
        /* ... */
    }
    
    1. Then we have bunch of parsing function...
    2. Finally, we see tok_underflow_interactive function, that requests tokens with prompt through PyOS_Readline(stdin, stdout, tok->prompt) call

    P.S: The 'Your Guide to the CPython Source Code' article was really helpful. But beware - linked source code is coming from an older branch.