Search code examples
pythonclexlexical-analysis

How to return the shortest match for a token in Flex?


I am coding a lexical analyzer for Python 2.7. I have a problem with the regex for the longstring items. This is the code I hace for this kind of strings:

ESCAPESEQ   \\\x
SHORTSTRINGITEM {SHORTSTRINGCHAR}|{ESCAPESEQ}
SHORTSTRING (\'{SHORTSTRINGITEM}*\')|(\"{SHORTSTRINGITEM}*\")
LONGSTRINGCHAR  [^\\(\'\'\')(\"\"\")]
LONGSTRINGITEM  {LONGSTRINGCHAR}|{ESCAPESEQ}
LONGSTRING  (\'\'\'{LONGSTRINGITEM}*\'\'\')|(\"\"\"{LONGSTRINGITEM}*\"\"\")
LONGSTRINGLITERAL   {STRINGPREFIX}?{LONGSTRING}

If I analyse a Python code that has two longstrings separated with other tokens, my analyzer returns the two longstrings and the code between them as on token. That is because Flex tries to return the longest match posible. However I want to return the shortest match only for this token longstring. Thank you for the answers.


Solution

  • Try to define it like this:

    DOCUMENTACION_D \"\"\"
    DOCUMENTACION   {DOCUMENTACION_D}([^\"]|\\\"|\n)*{DOCUMENTACION_D}
    

    The rule would be something like this:

    {DOCUMENTACION} {
      doSomething();
    }