Search code examples
pythonregexpython-2.7bnf

A regular expression for splitting symbols of a BNF grammar


I need to iterate over symbols of production rules of form:

e.g: Input

<relational operator> ::= = | <> | < | <= | >= | > | in
<next constant definition> ::= <empty> | <next constant definition> ; <constant definition>

so I was in need to derive a regular expression to split the text. Here's what I have so far

(?:\s|^|\s<|^<)(?:.*?)(?:\s|$|\s>|>$)

the problem is re.findall() doesn't produce my desired output

Expected output is:

[<relational operator>, ::=, =, |, <>, |, <, |, <=, |, >=, |, >, |, in]
[<next constant definition>, ::=, <empty>, |, <next constant definition>, ;, <constant definition>]

Solution

  • How about using something simple like <\w+(?:\s+\w+)*>|\S+

       < \w+ 
       (?: \s+ \w+ )*
       >
    |  
       \S+