Search code examples
regexhaskellrecursionidrislightyear

Is there a way to code a chainl function in idris lightyear library?


I'm trying to formalise a regular expression based string search tool in Idris (current status here). But I'm fighting with the problem of parsing regular expressions. I've tried to build a small parsing library but gave up on this in favor to use Lightyear, a parsing combinator library for Idris.

Since I'm used to Haskell, I've tried to use a similar strategy that I would do using Parsec. My main problem is how to handle left recursion on Lightyear parsers? I've tried several encodings but pretty much all parsers end up looping and causing segmentation faults in generated code.


Solution

  • I don't know Lightyear, but I had some success porting Parsec to Idris:

    module Parser
    
    data Parser : Type -> Type where
         P : (String -> List (a, String)) -> Parser a
    
    unP : Parser a -> String -> List (a, String)
    unP (P f) = f
    
    total stripPrefix : (Eq a) => List a -> List a -> Maybe (List a)
    stripPrefix [] ys = Just ys
    stripPrefix (x::xs) (y::ys) = if (x == y) then stripPrefix xs ys else Nothing
    stripPrefix _ _  = Nothing
    
    total token : String -> Parser ()
    token tk = P $ \s => case stripPrefix (unpack tk) (unpack s) of
          Just s' => [((), pack s')]
          Nothing => []
    
    total skip : Parser ()
    skip = P $ \s => case unpack s of
         [] => []
         (_::s') => [((), pack s')]
    
    instance Functor Parser where
      map f p = P $ \s => map (\(x, s') => (f x, s')) (unP p s)
    
    instance Applicative Parser where
      pure x = P $ \s => [(x, s)]
      (P pf) <*> (P px) = P $ \s => concat (map (\(f, s') => map (\(x, s'') => (f x, s'')) (px s')) (pf s))
    
    instance Alternative Parser where
      empty = P $ \s => []
      (P p1) <|> (P p2) = P $ \s => case p1 s of
         [] => p2 s
         results => results
    
    instance Monad Parser where
      px >>= f = P $ \s => concat (map (\(x, s') => unP (f x) s') (unP px s))
    
    total runParser : Parser a -> String -> Maybe a
    runParser (P p) s = case p s of
      [(x, "")] => Just x
      _         => Nothing
    

    This allows a straight copy-paste implementation of chainl:

    chainl1 : Parser a -> Parser (a -> a -> a) -> Parser a
    chainl1 p op = p >>= rest
      where
        rest x = do { f <- op; y <- p; rest $ f x y } <|> return x
    
    chainl : Parser a -> Parser (a -> a -> a) -> a -> Parser a
    chainl p op x = chainl1 p op <|> return x
    

    We can then take a straight transliteration of the expression parser from the chainl docs (I'm too lazy to implement a proper integer parser so we'll just use unary):

    parens : Parser a -> Parser a
    parens p = token "(" *> p <* token ")"
    
    symbol : String -> Parser ()
    symbol = token
    
    integer : Parser Nat
    integer = P $ \s => case unpack s of
         ('Z'::s') => [(Z, pack s')]
         ('S'::s') => map (\(n, s'') => (S n, s'')) $ unP integer (pack s')
         _ => []
    
    mutual
        expr : Parser Nat
        expr = term   `chainl1` addop
    
        term : Parser Nat
        term = factor `chainl1` mulop
    
        factor : Parser Nat
        factor  = parens expr <|> integer
    
        mulop : Parser (Nat -> Nat -> Nat)
        mulop = (symbol "*" *> pure (*)) <|>
                (symbol "/" *> pure div)
    
        addop : Parser (Nat -> Nat -> Nat)
        addop = (symbol "+" *> pure (+)) <|>
                (symbol "-" *> pure (-))
    

    Now, if you try this:

    main : IO ()
    main = do
      s <- getLine
      printLn $ runParser expr s
    

    then it will have the same divergant behaviour that you've observed. However, we can make two small changes:

    1. Introduce a lazy alternative combinator:

      orElse : Parser a -> Lazy (Parser a) -> Parser a
      orElse p1 p2 = P $ \s => case unP p1 s of
         [] => unP p2 s
         results => results
      
    2. Make sure the recursive part of factor, i.e. the parens expr part, is in this lazy position, by flipping the two alternatives:

      factor = integer `orElse`  parens expr
      

    This then works as expected:

    13:06:07 [cactus@galaxy brainfuck]$ idris Expr.idr -o Expr
    13:06:27 [cactus@galaxy brainfuck]$ echo "SZ+(SSZ*SSSZ)" | ./Expr
    Just 7