Learning to use the Parsec library, part of homework.
EDIT: Suggestions to use other libraries are welcome, the point is the parsing.
What I want, is to extract all words with a capital letter, and four compass directions from any sentence. Example: "Belgium totally lies south of Holland." should find and return "Belgium south Holland".
What I can't figure is how to ignore (eat) any input that is -not- a compass direction. I was looking to find something along the lines of
'many (not compassDirection >> space)'
but g(h)oogle isn't helping me.
Following code is obviously stuck on the 'many' function.
readExpr :: String -> String
readExpr input = case parse (parseLine) "" input of
Left err -> "No match: " ++ show err
Right val -> "Found: " ++ showVal val
parseLine :: Parser GraphValue
parseLine = do
x <- parseCountry
space
many ( some (noneOf " ") >> space )
y <- parseCompass
space
many ( some (noneOf " ") >> space )
z <- parseCountry
return $ Direction [x,y,z]
compassDirection :: Parser String
compassDirection = string "north" <|>
string "south" <|>
string "east" <|>
string "west"
parseCountry :: Parser GraphValue
parseCountry = do
c <- upper
x <- many (lower)
return $ Country (c:x)
parseCompass :: Parser GraphValue
parseCompass = do
x <- compassDirection
return $ Compass x
I won't go into specifics since this is homework and the OP said the "important thing is the parsing".
The way I'd solve this problem:
tokenize the input. Break it into words; this will free the real parsing step from having to worry about token definitions (i.e. "is %#@[ part of a word?") or whitespace. This could be as simple as words
or you could use Parsec for the tokenization. Then you'll have [Token]
(or [String]
if you prefer).
a parser for compass directions. You already have this (good job), but it'll have to modified a bit if the input is [String]
instead of String
.
a parser for words that start with a capital letter.
a parser for everything else, that succeeds whenever it sees a token that isn't a compass direction or a word starting with a caps.
a parser that works on any token, but distinguishes between good stuff and bad stuff, perhaps using an algebraic data type.
a parser that works on lots of tokens
Hopefully that's clear without being too clear; you'll still have to worry about when to discard the junk, for example. The basic idea is to break the problem down into lots of little sub-problems, solve the sub-problems, then glue those solutions together.