Is there a way to code a chainl function in idris lightyear library?

I'm trying to formalise a regular expression based string search tool in Idris (current status here). But I'm fighting with the problem of parsing regular expressions. I've tried to build a small parsing library but gave up on this in favor to use Lightyear, a parsing combinator library for Idris.

Since I'm used to Haskell, I've tried to use a similar strategy that I would do using Parsec. My main problem is how to handle left recursion on Lightyear parsers? I've tried several encodings but pretty much all parsers end up looping and causing segmentation faults in generated code.

Solution

I don't know Lightyear, but I had some success porting Parsec to Idris:

module Parser

data Parser : Type -> Type where
     P : (String -> List (a, String)) -> Parser a

unP : Parser a -> String -> List (a, String)
unP (P f) = f

total stripPrefix : (Eq a) => List a -> List a -> Maybe (List a)
stripPrefix [] ys = Just ys
stripPrefix (x::xs) (y::ys) = if (x == y) then stripPrefix xs ys else Nothing
stripPrefix _ _  = Nothing

total token : String -> Parser ()
token tk = P $ \s => case stripPrefix (unpack tk) (unpack s) of
      Just s' => [((), pack s')]
      Nothing => []

total skip : Parser ()
skip = P $ \s => case unpack s of
     [] => []
     (_::s') => [((), pack s')]

instance Functor Parser where
  map f p = P $ \s => map (\(x, s') => (f x, s')) (unP p s)

instance Applicative Parser where
  pure x = P $ \s => [(x, s)]
  (P pf) <*> (P px) = P $ \s => concat (map (\(f, s') => map (\(x, s'') => (f x, s'')) (px s')) (pf s))

instance Alternative Parser where
  empty = P $ \s => []
  (P p1) <|> (P p2) = P $ \s => case p1 s of
     [] => p2 s
     results => results

instance Monad Parser where
  px >>= f = P $ \s => concat (map (\(x, s') => unP (f x) s') (unP px s))

total runParser : Parser a -> String -> Maybe a
runParser (P p) s = case p s of
  [(x, "")] => Just x
  _         => Nothing

This allows a straight copy-paste implementation of chainl:

chainl1 : Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
  where
    rest x = do { f <- op; y <- p; rest $ f x y } <|> return x

chainl : Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op x = chainl1 p op <|> return x

We can then take a straight transliteration of the expression parser from the chainl docs (I'm too lazy to implement a proper integer parser so we'll just use unary):

parens : Parser a -> Parser a
parens p = token "(" *> p <* token ")"

symbol : String -> Parser ()
symbol = token

integer : Parser Nat
integer = P $ \s => case unpack s of
     ('Z'::s') => [(Z, pack s')]
     ('S'::s') => map (\(n, s'') => (S n, s'')) $ unP integer (pack s')
     _ => []

mutual
    expr : Parser Nat
    expr = term   `chainl1` addop

    term : Parser Nat
    term = factor `chainl1` mulop

    factor : Parser Nat
    factor  = parens expr <|> integer

    mulop : Parser (Nat -> Nat -> Nat)
    mulop = (symbol "*" *> pure (*)) <|>
            (symbol "/" *> pure div)

    addop : Parser (Nat -> Nat -> Nat)
    addop = (symbol "+" *> pure (+)) <|>
            (symbol "-" *> pure (-))

Now, if you try this:

main : IO ()
main = do
  s <- getLine
  printLn $ runParser expr s

then it will have the same divergant behaviour that you've observed. However, we can make two small changes:

Introduce a lazy alternative combinator:

orElse : Parser a -> Lazy (Parser a) -> Parser a
orElse p1 p2 = P $ \s => case unP p1 s of
   [] => unP p2 s
   results => results

Make sure the recursive part of factor, i.e. the parens expr part, is in this lazy position, by flipping the two alternatives:
```
factor = integer `orElse`  parens expr
```

This then works as expected:

13:06:07 [cactus@galaxy brainfuck]$ idris Expr.idr -o Expr
13:06:27 [cactus@galaxy brainfuck]$ echo "SZ+(SSZ*SSSZ)" | ./Expr
Just 7