Search code examples
regexhaskellexpression-treespostfix-notation

Haskell - Reverse polish notation regular expression to expression tree


Im struggling with coming up with algorithm that converts a regular expression (in context of regular languages, there are only 3 operations '.' for concat, '+' for "or" and '*' for iteration) in reverse polish notation to the regular expression tree.

For example i have a regular expression

aa.bb.+

for which i need to construct following expression tree

      +
    /   \
   .     .
  / \   / \
 b   b a   a  

Would appreciate any help. Thanks!


Solution

  • This is not hard to do using an stack-based algorithm (which translates into a foldl in Haskell):

    data Tree
      = Symbol Char
      | Op Char Tree Tree
      deriving Show
    
    type Stack = [Tree]
    
    step :: Stack -> Char -> Stack
    step (r:l:s) '.' = (Op '.' l r):s
    step (r:l:s) '+' = (Op '+' l r):s
    step s c         = (Symbol c):s
    
    parse :: String -> Stack
    parse = foldl step []
    

    which would yield something like this:

    λ> parse "aa.bb.+"
    [Op '+' (Op '.' (Symbol 'a') (Symbol 'a')) (Op '.' (Symbol 'b') (Symbol 'b'))]
    

    you did not write anything about your data-structures so I took some liberties - it should be easy to adapt to your situation once you understood step and the foldl in parse


    if you want to change left/right as in your example you just have to do it in step:

    step :: Stack -> Char -> Stack
    step (l:r:s) '.' = (Op '.' l r):s
    step (l:r:s) '+' = (Op '+' l r):s
    step s c         = (Symbol c):s
    

    example:

    λ> parse "aa.bb.+"
    [Op '+' (Op '.' (Symbol 'b') (Symbol 'b')) (Op '.' (Symbol 'a') (Symbol 'a'))]
    

    but I would expect the first one if the order would matter (here you have bb before aa which is strange ...


    of course it's strange that the star operation is binary too so I would suggest:

    data Tree
      = Symbol Char
      | Concat Tree Tree
      | Star Tree
      deriving Show
    
    type Stack = [Tree]
    
    step :: Stack -> Char -> Stack
    step (r:l:s) '.' = (Concat r l):s
    step (t:s)   '+' = (Star t):s
    step s c         = (Symbol c):s
    
    parse :: String -> Stack
    parse = foldl step []
    

    which would change your syntax a bit:

    λ> parse "aa.bb..+"
    [Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a')))]
    

    but which seems saner IMO


    convert list to Maybe

    of course you might not like the [Stack] return so you could do this:

    parse :: String -> Maybe Tree
    parse input =
      case result of
        [t] -> Just t
        []  -> Nothing
        _   -> error "syntax error"
      where result = foldl step [] input
    

    examples:

    λ> parse "aa.bb..+"
    Just (Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a'))))
    λ> parse ""
    Nothing
    λ> parse "aa.bb.+"
    *** Exception: syntax error