I have a file containing many of the following data format:
Dan Clark’s Profile Photo
Member Name
Dan Clark 2nd degree connection 2nd
Member Occupation
Founder and Headmaster at Some Company, LLC
Nina blalba’s Profile Photo
Member Name
Nina blabla 2nd degree connection 2nd
Member Occupation
Consultant - GAmes executive search
My parser to parse the above file:
module Main where
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec hiding (many, (<|>))
data Contact = Contact {
name :: String,
occupation :: String,
company :: String
} deriving Show
matchContact :: Parser Contact
matchContact = do
name <- many anyChar
char '\''
string "s Profile Photo"
char '\n'
string "Member Name"
char '\n'
string name
many anyChar
char '\n'
string "Member Occupation"
char '\n'
job <- many anyChar
try $ string " at "
company <- many anyChar
try (char '\n')
return $ Contact name job company
main = do
c <- parseFromFile (many matchContact <* eof) "contacts.txt"
print c
There are many issues such as the data are not regular. But the most urgent one is that I always run into the error at the last line of the input file:
Left "contacts.txt" (line 8670, column 12):
unexpected end of input
expecting "'"
How to fix this?
The first instance you attempt many anyChar
, the parser will happily parse all the rest of the file into the string name
, since everything that follows clearly fulfills the criterion any character (including the newline characters). That's clearly not what you want.
Use manyTill
, or restrict the choice of permitted characters so the name
will end at the appropriate place.