Search code examples
haskellboolean-expressionattoparsec

How do I use takeTill until a tab or newline in Haskell Attoparsec? (Problems with Boolean expressions)


I'm writing my first Haskell program. The program parses ordinary CSV files, but I'm running into many issues that no doubt stem from my inexperience with the syntax.

Currently, the code parses one record successfully, but on the final record, the parser takes up the newline and therefore doesn't process records on subsequent lines.

My proposed solution is to add a check to my fieldData specification to check for 'takeTill tab or newline', but I don't know how to do this.

Current code:

fieldData = takeTill (== '\t')

Attempts:

fieldData = takeTill (== '\t' || '\n') -- wrong, something about infix precedence
fieldData = takeTill (== ('\t' || '\n')) -- wrong, type error
fieldData = takeTill ((== '\t') || (== '\n')) -- wrong, type error
fieldData x = takeTill ((x == '\t') || (x == '\n')) -- wrong, type error
fieldData x = takeTill x ((x == '\t') || (x == '\n')) -- wrong, not enough arguments

I feel that I have some fundamental misunderstanding of how to construct Boolean conditions in Haskell and would like help. For example, in ghci I can do let fun x = (x == 'a' || x == 'b') and it'll match different characters fine, so I'm clearly missing something when it comes to using it with a function.

Alternatively, is this even the correct approach? If this is not the right way to approach the problem I would appreciate pointers to the "correct" way.

Complete code below:

{- Parsing a tab-separated file using Attoparsec.
A record contains:
number\tname\tgenre\tabilities\tweapon\n

-}
import System.FilePath.Posix
import Data.Attoparsec.Char8
import Control.Applicative
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C

data AbilitiesList = AbilitiesList String deriving Show

data PlayerCharacter = PlayerCharacter {
    id :: Integer,
    name :: String,
    genre :: String,
    abilities :: AbilitiesList,
    weapon :: String
} deriving Show

type Players = [PlayerCharacter]

fieldData = takeTill (== '\t')
tab = char '\t'

parseCharacter :: Parser PlayerCharacter
parseCharacter = do
    id <- decimal
    tab
    name <- fieldData
    tab
    genre <- fieldData
    tab
    abilities <- fieldData
    tab
    weapon <- fieldData
    return $ PlayerCharacter id (C.unpack name) (C.unpack genre) (AbilitiesList (C.unpack abilities)) (C.unpack weapon)

abilitiesFile :: FilePath
abilitiesFile = joinPath ["data", "ff_abilities.txt"]

playerParser :: Parser Players
playerParser = many $ parseCharacter <* endOfLine

main :: IO ()
main = B.readFile abilitiesFile >>= print . parseOnly playerParser

Solution

  • For this you probably want to use a lambda:

    takeTill (\x -> x == '\t' || x == '\n')
    

    A lambda function is an anonymous, one-use, inline function. You can use them just like normal functions, except they aren't bound to a name.

    You could also define a function

    tabOrNL :: Char -> Bool
    tabOrNL '\t' = True
    tabOrNL '\n' = True
    tabOrNL _    = False
    
    -- Or equivalently
    
    tabOrNL :: Char -> Bool
    tabOrNL x = x == '\t' || x == '\n'
    

    Then you could just do

    takeTill tabOrNL
    

    If you wanted to get really fancy, the Applicative instance for functions can come in handy here:

    (<||>) :: Applicative f => f Bool -> f Bool -> f Bool
    (<||>) = liftA2 (||)
    infixr 2 <||>
    

    Then you can just do

    takeTill ((== '\t') <||> (== '\n'))
    

    Or even

    takeTill ((== '\t') <||> (== '\n') <||> (== ','))
    

    That way you avoid the lambda or helper function entirely, the <||> lets you just "or together" several predicates as if they were values. You can do similarly with (<&&>) = liftA2 (&&), but it's probably not as useful for you here.