Search code examples

Why doesn't the Python interpreter return the explicit SyntaxError message?

When looking at CPython's tokenizer.c, the tokenizer returns specific error messages.

As an example, you can take a look at the part where the tokenizer tries to parse a decimal number. When trying to parse the number 5_6 everything should be OK, but when trying to parse the number 5__6 the tokenizer should return a SyntaxError with the message "invalid decimal literal":

static int
tok_decimal_tail(struct tok_state *tok)
    int c;

    while (1) {
        do {
            c = tok_nextc(tok);
        } while (isdigit(c));
        if (c != '_') {
        c = tok_nextc(tok);
        if (!isdigit(c)) {
            tok_backup(tok, c);
            syntaxerror(tok, "invalid decimal literal");
            return 0;
    return c;

Using Python, I've tried to reach the tokenizer's SyntaxError message:

In [12]: try: 
    ...:     eval('5__6') 
    ...: except SyntaxError as e: 
    ...:     print(e.args, e.filename, e.lineno, e.msg, e.text) 

('invalid token', ('<string>', 1, 2, '5__6')) <string> 1 invalid token 5__6

Is there any way to extract the SyntaxError message from the tokenizer?


  • You are looking at source code that is only present in Python 3.8a1 and newer, see the pull request that introduced this message in July 2018:

    bpo-33305: Improve SyntaxError for invalid numerical literals. (GH-6517)

    and the attached Python issue #33305.

    When I run your code with Python 3.8b2 (the current beta) I see the message you expected:

    >>> import sys
    >>> sys.version_info
    sys.version_info(major=3, minor=8, micro=0, releaselevel='beta', serial=2)    
    >>> try:
    ...     eval('5__6')
    ... except SyntaxError as e:
    ...     print(e.args, e.filename, e.lineno, e.msg, e.text)
    ('invalid decimal literal',) <string> 1 invalid decimal literal None

    You tried this out on Python 3.7 or older, so won't yet see the newer messages.