I try to do this:
Parse a Text in the form:
Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}
into a list of some data structure:
[Inside "Some Text ",Outside (0,0,0),Inside " some Text ",Outside (0,0,0),Outside (0,0,0),Inside " more Text ",Outside (0,0,0)]
So these #{a,b,c}-bits should turn into different things as the rest of the text.
I have this code:
module ParsecTest where
import Text.ParserCombinators.Parsec
import Monad
type Reference = (Int, Int, Int)
data Transc = Inside String | Outside Reference
deriving (Show)
text :: Parser Transc
text = do
x <- manyTill anyChar ((lookAhead reference) <|> (eof >> return (Inside "")));
return (Inside x)
transc = reference <|> text
alot :: Parser [Transc]
alot = do
manyTill transc eof
reference :: Parser Transc
reference = try (do{ char '#';
char '{';
a <- number;
char ',';
b <- number;
char ',';
c <- number;
char '}';
return (Outside (a,b,c)) })
number :: Parser Int
number = do{ x <- many1 digit;
return (read x) }
This works as expected. You can test this in ghci by typing
parseTest alot "Some Text #{0,0,0} some Text #{0,0,0}#{0,0,0} more Text #{0,0,0}"
But I think it's not nice.
1) Is the use of lookAhead
really necessary for my problem?
2) Is the return (Inside "")
an ugly hack?
3) Is there generally a more concise/smarter way to archieve the same?
1) I think you do need lookAhead
as you need the result of that parse. It would be nice to avoid running that parser twice by having a Parser (Transc,Maybe Transc)
to indicate an Inside
with an optional following Outside
. If performance is an issue, then this is worth doing.
2) Yes.
3) Applicative
s
number2 :: Parser Int
number2 = read <$> many1 digit
text2 :: Parser Transc
text2 = (Inside .) . (:)
<$> anyChar
<*> manyTill anyChar (try (lookAhead reference2) *> pure () <|> eof)
reference2 :: Parser Transc
reference2 = ((Outside .) .) . (,,)
<$> (string "#{" *> number2 <* char ',')
<*> number2
<*> (char ',' *> number2 <* char '}')
transc2 = reference2 <|> text2
alot2 = many transc2
You may want to rewrite the beginning of reference2
using a helper like aux x y z = Outside (x,y,z)
.
EDIT: Changed text
to deal with inputs that don't end with an Outside
.