Search code examples
pythonply

python ply syntax error,can't parse d[0-9]+


I'm trying to parse this list: d0,d1,d2,d3,....d456,d457....

To parse this in python-ply, I wrote this as expression :

t_DID                   =   r'[d][0-9]+'
t_DID                   =   r'd[0-9]+'
t_DID                   =   r'\d[0-9]+'

But, it provides me error.

When, I enter 1, it gives me - DEBUG:root:Syntax error at '1'

And when I enter d, it gives me - DEBUG:root:Syntax error at 'd'

What would be the correct token, for this pattern?

How can I resolve this ?


Solution

  • None of those patterns match either d or 1.

    • r'[d][0-9]+' and r'd[0-9]+' match a d followed by at least one digit. So they will match d1 or d234, but they won't match d because it is not followed by a digit, and they will not match 1 because it doesn't start with d

    • r'\d[0-9]+' matches a digit (\d) followed by at least one digit more. So it won't match any string starting with d, and it won't match 1 because it requires at least two digits. But it will match 12, 274 and 29847502948375029384750293485702938750493875.

    You can read about Python regular expressions in the Python docs (The \ escape codes, including \d, are here).

    It's easy to build an interactive tool which lets you experiment with Python regular expressions. Here's a very simple example, which could be improved a lot:

    $ python3
    Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> import readline
    >>> def try_regex(regex):
    ...   r = re.compile(regex)
    ...   try:
    ...     while True:
    ...       match = r.match(input('--> '))
    ...       if match:
    ...         print(f"Matched {match.end()} characters: {match[0]}")
    ...       else:
    ...         print("No match")
    ...   except EOFError:
    ...     pass
    ... 
    >>> try_regex(r'd[0-9]+')
    --> d1
    Matched 2 characters: d1
    --> d123
    Matched 4 characters: d123
    --> 1
    No match
    --> d
    No match
    --> d123 abc
    Matched 4 characters: d123
    --> d123abc
    Matched 4 characters: d123