I am trying to parse mediawiki text using Parsec. Some of the constructs in mediawiki markup can only occur at the start of rows (such as the header markup ==header level 2==
). In regexp I would use an anchor (such as ^
) to find the start of a line.
One attempt in GHCi is
Prelude Text.Parsec> parse (char '\n' *> string "==" *> many1 letter <* string "==") "" "\n==hej=="
Right "hej"
but this is not too good since it will fail on the first line of a file. I feel like this should be a solved problem...
What is the most idiomatic "Start of line" parsing in Parsec?
You can use getPosition
and sourceColumn
in order to find out the column number that the parser is currently looking at. The column number will be 1
if the current position is at the start of a line (such as at the start of input or after a \n
or \r
character).
There isn't a built-in combinator for this, but you can easily make it:
import Text.Parsec
import Control.Monad (guard)
startOfLine :: Monad m => ParsecT s u m ()
startOfLine = do
pos <- getPosition
guard (sourceColumn pos == 1)
Now you can write your header parser as:
header = startOfLine *> string "==" *> many1 letter <* string "=="