I'm writing my first Haskell program. The program parses ordinary CSV files, but I'm running into many issues that no doubt stem from my inexperience with the syntax.
Currently, the code parses one record successfully, but on the final record, the parser takes up the newline and therefore doesn't process records on subsequent lines.
My proposed solution is to add a check to my fieldData specification to check for 'takeTill tab or newline', but I don't know how to do this.
Current code:
fieldData = takeTill (== '\t')
Attempts:
fieldData = takeTill (== '\t' || '\n') -- wrong, something about infix precedence
fieldData = takeTill (== ('\t' || '\n')) -- wrong, type error
fieldData = takeTill ((== '\t') || (== '\n')) -- wrong, type error
fieldData x = takeTill ((x == '\t') || (x == '\n')) -- wrong, type error
fieldData x = takeTill x ((x == '\t') || (x == '\n')) -- wrong, not enough arguments
I feel that I have some fundamental misunderstanding of how to construct Boolean conditions in Haskell and would like help. For example, in ghci I can do let fun x = (x == 'a' || x == 'b')
and it'll match different characters fine, so I'm clearly missing something when it comes to using it with a function.
Alternatively, is this even the correct approach? If this is not the right way to approach the problem I would appreciate pointers to the "correct" way.
Complete code below:
{- Parsing a tab-separated file using Attoparsec.
A record contains:
number\tname\tgenre\tabilities\tweapon\n
-}
import System.FilePath.Posix
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
data AbilitiesList = AbilitiesList String deriving Show
data PlayerCharacter = PlayerCharacter {
id :: Integer,
name :: String,
genre :: String,
abilities :: AbilitiesList,
weapon :: String
} deriving Show
type Players = [PlayerCharacter]
fieldData = takeTill (== '\t')
tab = char '\t'
parseCharacter :: Parser PlayerCharacter
parseCharacter = do
id <- decimal
tab
name <- fieldData
tab
genre <- fieldData
tab
abilities <- fieldData
tab
weapon <- fieldData
return $ PlayerCharacter id (C.unpack name) (C.unpack genre) (AbilitiesList (C.unpack abilities)) (C.unpack weapon)
abilitiesFile :: FilePath
abilitiesFile = joinPath ["data", "ff_abilities.txt"]
playerParser :: Parser Players
playerParser = many $ parseCharacter <* endOfLine
main :: IO ()
main = B.readFile abilitiesFile >>= print . parseOnly playerParser
For this you probably want to use a lambda:
takeTill (\x -> x == '\t' || x == '\n')
A lambda function is an anonymous, one-use, inline function. You can use them just like normal functions, except they aren't bound to a name.
You could also define a function
tabOrNL :: Char -> Bool
tabOrNL '\t' = True
tabOrNL '\n' = True
tabOrNL _ = False
-- Or equivalently
tabOrNL :: Char -> Bool
tabOrNL x = x == '\t' || x == '\n'
Then you could just do
takeTill tabOrNL
If you wanted to get really fancy, the Applicative
instance for functions can come in handy here:
(<||>) :: Applicative f => f Bool -> f Bool -> f Bool
(<||>) = liftA2 (||)
infixr 2 <||>
Then you can just do
takeTill ((== '\t') <||> (== '\n'))
Or even
takeTill ((== '\t') <||> (== '\n') <||> (== ','))
That way you avoid the lambda or helper function entirely, the <||>
lets you just "or together" several predicates as if they were values. You can do similarly with (<&&>) = liftA2 (&&)
, but it's probably not as useful for you here.