I have a little parsec parser that can parse tab separated values (TSV) into strings. I want to switch to check for numbers and boolean values (listed as "Y" or "N") in the source file.
Here's the old TSV version (returns [[String]]
)
tsvFile = endBy line newline
line = sepBy cell tab
cell = many (noneOf "\t\n")
I would like to change it to support these types:
data Cell = CellString String
| CellNumber Int
| CellBool Bool
deriving (Show)
Here are the functions I've defined for number and bool. Are these incorrect?
cellBool = do
b <- oneOf "YN"
return $ CellBool (b == 'Y')
cellNumber = do
d <- many digit
return $ CellNumber (read d)
cellString = do
s <- many (noneOf "\t\n")
return $ CellString s
And here's what I thought I needed to do to get it to work:
cell = cellBool <|> cellNumber <|> cellString
But it doesn't work. Running cellNumber before cellString returns Right []
. If I put cellString
first in the list, it parses the whole file as strings.
I'm sure I'm missing something basic. Like, only the cellString
method is dealing with the tab separator I think, but I'm really new to parsec and confused. I appreciate your help!
I was able to get it working by simply changing the definition of cellNumber
:
cellNumber = do
d <- many1 digit
return $ CellNumber (read d)
The problem was that cellNumber
was reading an empty string due to the use of many
. Using many1
means that parser fails, allowing cellString
to execute.
However, at this point your parser would fail on an input like "123a\n"
, so you'll need to figure out the backtracking to get that working.
Using the definition
cellNumber = do
d <- many1 digit
lookAhead $ oneOf "\t\n"
return $ CellNumber (read d)
probably isn't ideal. Instead, I would consider something like
cellNumber = do
d <- many1 digit
notFollowedBy cellString
return $ CellNumber (read d)
Then change your cell
function to be
cell = try cellBool <|> try cellNumber <|> cellString