Data.Attoparsec.Text
exports takeWhile
and takeWhile1
:
takeWhile :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser does not fail. It will return an empty string if the predicate returns
False
on the first character of input.[...]
takeWhile1 :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns
True
or if there is no input left.
attoparsec
's documentation encourages the user to
Use the
Text
-oriented parsers whenever possible, e.g.takeWhile1
instead ofmany1 anyChar
. There is about a factor of 100 difference in performance between the two kinds of parser.
Those two parsers are very useful, but I keep feeling the need for a more general version of takeWhile1
, more specifically, some hypothetical parser
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined
that would parse at least lo
characters satisfying predicate f
, where lo
is an arbitrary nonnegative integer.
I had a look at takeWhile1
's implementation, but it uses a bunch of functions private to Data.Attoparsec.Text.Internal
and doesn't seem easily generalizable.
I came up with the following applicative implementation:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding ( takeWhile )
import Control.Applicative ( (<*>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f
It works as advertised,
λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"
but the need for packing the intermediate list of results returned by count
worries me, especially for cases where lo
is large... It seems to go against the recommendation to
use the
Text
-oriented parsers whenever possible [...]
Am I missing something? Is there a more efficient/idiomatic way of implementing such a takeWhileLo
combinator?
Parser
is a monad, so you can just inspect the return value and fail if the length's not right:
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
text <- takeWhile f
case T.compareLength text lo of
LT -> empty
_ -> return text
compareLength
is from the text
package. It's more efficient than comparing text
's length, because compareLength
may short-circuit.