Im struggling with coming up with algorithm that converts a regular expression (in context of regular languages, there are only 3 operations '.' for concat, '+' for "or" and '*' for iteration) in reverse polish notation to the regular expression tree.
For example i have a regular expression
aa.bb.+
for which i need to construct following expression tree
+
/ \
. .
/ \ / \
b b a a
Would appreciate any help. Thanks!
This is not hard to do using an stack-based algorithm (which translates into a foldl
in Haskell):
data Tree
= Symbol Char
| Op Char Tree Tree
deriving Show
type Stack = [Tree]
step :: Stack -> Char -> Stack
step (r:l:s) '.' = (Op '.' l r):s
step (r:l:s) '+' = (Op '+' l r):s
step s c = (Symbol c):s
parse :: String -> Stack
parse = foldl step []
which would yield something like this:
λ> parse "aa.bb.+"
[Op '+' (Op '.' (Symbol 'a') (Symbol 'a')) (Op '.' (Symbol 'b') (Symbol 'b'))]
you did not write anything about your data-structures so I took some liberties - it should be easy to adapt to your situation once you understood step
and the foldl
in parse
if you want to change left/right as in your example you just have to do it in step
:
step :: Stack -> Char -> Stack
step (l:r:s) '.' = (Op '.' l r):s
step (l:r:s) '+' = (Op '+' l r):s
step s c = (Symbol c):s
example:
λ> parse "aa.bb.+"
[Op '+' (Op '.' (Symbol 'b') (Symbol 'b')) (Op '.' (Symbol 'a') (Symbol 'a'))]
but I would expect the first one if the order would matter (here you have bb
before aa
which is strange ...
of course it's strange that the star operation is binary too so I would suggest:
data Tree
= Symbol Char
| Concat Tree Tree
| Star Tree
deriving Show
type Stack = [Tree]
step :: Stack -> Char -> Stack
step (r:l:s) '.' = (Concat r l):s
step (t:s) '+' = (Star t):s
step s c = (Symbol c):s
parse :: String -> Stack
parse = foldl step []
which would change your syntax a bit:
λ> parse "aa.bb..+"
[Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a')))]
but which seems saner IMO
of course you might not like the [Stack]
return so you could do this:
parse :: String -> Maybe Tree
parse input =
case result of
[t] -> Just t
[] -> Nothing
_ -> error "syntax error"
where result = foldl step [] input
examples:
λ> parse "aa.bb..+"
Just (Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a'))))
λ> parse ""
Nothing
λ> parse "aa.bb.+"
*** Exception: syntax error