I'm trying to formalise a regular expression based string search tool in Idris (current status here). But I'm fighting with the problem of parsing regular expressions. I've tried to build a small parsing library but gave up on this in favor to use Lightyear, a parsing combinator library for Idris.
Since I'm used to Haskell, I've tried to use a similar strategy that I would do using Parsec. My main problem is how to handle left recursion on Lightyear parsers? I've tried several encodings but pretty much all parsers end up looping and causing segmentation faults in generated code.
I don't know Lightyear, but I had some success porting Parsec to Idris:
module Parser
data Parser : Type -> Type where
P : (String -> List (a, String)) -> Parser a
unP : Parser a -> String -> List (a, String)
unP (P f) = f
total stripPrefix : (Eq a) => List a -> List a -> Maybe (List a)
stripPrefix [] ys = Just ys
stripPrefix (x::xs) (y::ys) = if (x == y) then stripPrefix xs ys else Nothing
stripPrefix _ _ = Nothing
total token : String -> Parser ()
token tk = P $ \s => case stripPrefix (unpack tk) (unpack s) of
Just s' => [((), pack s')]
Nothing => []
total skip : Parser ()
skip = P $ \s => case unpack s of
[] => []
(_::s') => [((), pack s')]
instance Functor Parser where
map f p = P $ \s => map (\(x, s') => (f x, s')) (unP p s)
instance Applicative Parser where
pure x = P $ \s => [(x, s)]
(P pf) <*> (P px) = P $ \s => concat (map (\(f, s') => map (\(x, s'') => (f x, s'')) (px s')) (pf s))
instance Alternative Parser where
empty = P $ \s => []
(P p1) <|> (P p2) = P $ \s => case p1 s of
[] => p2 s
results => results
instance Monad Parser where
px >>= f = P $ \s => concat (map (\(x, s') => unP (f x) s') (unP px s))
total runParser : Parser a -> String -> Maybe a
runParser (P p) s = case p s of
[(x, "")] => Just x
_ => Nothing
This allows a straight copy-paste implementation of chainl
:
chainl1 : Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where
rest x = do { f <- op; y <- p; rest $ f x y } <|> return x
chainl : Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op x = chainl1 p op <|> return x
We can then take a straight transliteration of the expression parser from the chainl
docs (I'm too lazy to implement a proper integer
parser so we'll just use unary):
parens : Parser a -> Parser a
parens p = token "(" *> p <* token ")"
symbol : String -> Parser ()
symbol = token
integer : Parser Nat
integer = P $ \s => case unpack s of
('Z'::s') => [(Z, pack s')]
('S'::s') => map (\(n, s'') => (S n, s'')) $ unP integer (pack s')
_ => []
mutual
expr : Parser Nat
expr = term `chainl1` addop
term : Parser Nat
term = factor `chainl1` mulop
factor : Parser Nat
factor = parens expr <|> integer
mulop : Parser (Nat -> Nat -> Nat)
mulop = (symbol "*" *> pure (*)) <|>
(symbol "/" *> pure div)
addop : Parser (Nat -> Nat -> Nat)
addop = (symbol "+" *> pure (+)) <|>
(symbol "-" *> pure (-))
Now, if you try this:
main : IO ()
main = do
s <- getLine
printLn $ runParser expr s
then it will have the same divergant behaviour that you've observed. However, we can make two small changes:
Introduce a lazy alternative combinator:
orElse : Parser a -> Lazy (Parser a) -> Parser a
orElse p1 p2 = P $ \s => case unP p1 s of
[] => unP p2 s
results => results
Make sure the recursive part of factor
, i.e. the parens expr
part, is in this lazy position, by flipping the two alternatives:
factor = integer `orElse` parens expr
This then works as expected:
13:06:07 [cactus@galaxy brainfuck]$ idris Expr.idr -o Expr
13:06:27 [cactus@galaxy brainfuck]$ echo "SZ+(SSZ*SSSZ)" | ./Expr
Just 7