I am stuck writing an attoparsec parser to parse what the Uniform Code for Units of Measure calls a <ATOM-SYMBOL>
. It's defined to be the longest sequence of characters in a certain class (that class includes all the digits 0-9) which doesn't end with a digit.
So given the input foo27
I want to consume and return foo
, for 237bar26
I want to consume and return 237bar
, for 19
I want to fail without consuming anything.
I can't figure out how to build this out of takeWhile1
or takeTill
or scan
but I am probably missing something obvious.
Update: My best attempt so far was that I managed to exclude sequences that are entirely digits
atomSymbol :: Parser Text
atomSymbol = do
r <- core
if (P.all (inClass "0-9") . T.unpack $ r)
then fail "Expected an atom symbol but all characters were digits."
else return r
where
core = A.takeWhile1 $ inClass "!#-'*,0-<>-Z\\^-z|~"
I tried changing that to test if the last character was a digit instead of if they all were, but it doesn't seem to backtrack one character at a time.
Update 2:
The whole file is at https://github.com/dmcclean/dimensional-attoparsec/blob/master/src/Numeric/Units/Dimensional/Parsing/Attoparsec.hs. This only builds against the prefixes
branch from https://github.com/dmcclean/dimensional.
You should reformulate the problem and treat spans of digits (0-9
) and spans of non-digit characters (!#-'*,:-<>-Z\\^-z|~
) separately. The syntactic element of interest can then be described as
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative ((<|>), many)
import Data.Char (isDigit)
import Data.Attoparsec.Combinator (option)
import Data.Attoparsec.Text (Parser)
import qualified Data.Attoparsec.Text as A
import Data.Text (Text)
import qualified Data.Text as T
atomSymbol :: Parser Text
atomSymbol = f <$> (option "" digitSpan)
<*> (nonDigitSpan <|> fail errorMsg)
<*> many (g <$> digitSpan <*> nonDigitSpan)
where
nonDigitSpan = A.takeWhile1 $ A.inClass "!#-'*,:-<>-Z\\^-z|~"
digitSpan = A.takeWhile1 isDigit
f x y xss = T.concat $ x : y : concat xss
g x y = [x,y]
errorMsg = "Expected an atom symbol but all characters (if any) were digits."
[...] given the input
foo27
I want to consume and return foo, for237bar26
I want to consume and return237bar
, for19
I want to fail without consuming anything.
λ> A.parseOnly atomSymbol "foo26"
Right "foo"
λ> A.parseOnly atomSymbol "237bar26"
Right "237bar"
λ> A.parseOnly atomSymbol "19"
Left "Failed reading: Expected an atom symbol but all characters (if any) were digits."