Search code examples
haskellmediawikiparsec

Parsec start-of-row pattern?


I am trying to parse mediawiki text using Parsec. Some of the constructs in mediawiki markup can only occur at the start of rows (such as the header markup ==header level 2==). In regexp I would use an anchor (such as ^) to find the start of a line.

One attempt in GHCi is

Prelude Text.Parsec> parse (char '\n' *> string "==" *> many1 letter <* string "==") "" "\n==hej=="
Right "hej"

but this is not too good since it will fail on the first line of a file. I feel like this should be a solved problem...

What is the most idiomatic "Start of line" parsing in Parsec?


Solution

  • You can use getPosition and sourceColumn in order to find out the column number that the parser is currently looking at. The column number will be 1 if the current position is at the start of a line (such as at the start of input or after a \n or \r character).

    There isn't a built-in combinator for this, but you can easily make it:

    import Text.Parsec
    import Control.Monad (guard)
    
    startOfLine :: Monad m => ParsecT s u m ()
    startOfLine = do
        pos <- getPosition
        guard (sourceColumn pos == 1)
    

    Now you can write your header parser as:

    header = startOfLine *> string "==" *> many1 letter <* string "=="