Search code examples
parsinghaskellparsec

Parsec-Parser works alright, but could it be done better?


I try to do this:

Parse a Text in the form:

Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}

into a list of some data structure:

[Inside "Some Text ",Outside (0,0,0),Inside " some Text ",Outside (0,0,0),Outside (0,0,0),Inside " more Text ",Outside (0,0,0)]

So these #{a,b,c}-bits should turn into different things as the rest of the text.

I have this code:

module ParsecTest where

import Text.ParserCombinators.Parsec
import Monad

type Reference = (Int, Int, Int)

data Transc = Inside String | Outside Reference
              deriving (Show)

text :: Parser Transc
text =  do
         x <- manyTill anyChar ((lookAhead reference) <|> (eof >> return (Inside "")));
         return (Inside x)

transc = reference <|> text

alot :: Parser [Transc]
alot = do
        manyTill transc eof

reference :: Parser Transc
reference = try (do{ char '#';
                  char '{';
                  a <- number;
                char ',';
                b <- number;
                char ',';
                c <- number;
                char '}';
                return (Outside (a,b,c)) })

number :: Parser Int
number = do{ x <- many1 digit;
             return (read x) }

This works as expected. You can test this in ghci by typing

parseTest alot "Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}"

But I think it's not nice.

1) Is the use of lookAhead really necessary for my problem?

2) Is the return (Inside "") an ugly hack?

3) Is there generally a more concise/smarter way to archieve the same?


Solution

  • 1) I think you do need lookAhead as you need the result of that parse. It would be nice to avoid running that parser twice by having a Parser (Transc,Maybe Transc) to indicate an Inside with an optional following Outside. If performance is an issue, then this is worth doing.

    2) Yes.

    3) Applicatives

    number2 :: Parser Int
    number2 = read <$> many1 digit
    
    text2 :: Parser Transc
    text2 = (Inside .) . (:) 
         <$> anyChar 
         <*> manyTill anyChar (try (lookAhead reference2) *> pure () <|> eof)
    
    
    reference2 :: Parser Transc
    reference2 = ((Outside .) .) . (,,) 
              <$> (string "#{" *> number2 <* char ',') 
              <*> number2 
              <*> (char ',' *> number2 <* char '}')
    
    transc2 = reference2 <|> text2
    
    alot2 = many transc2
    

    You may want to rewrite the beginning of reference2 using a helper like aux x y z = Outside (x,y,z).

    EDIT: Changed text to deal with inputs that don't end with an Outside.