Search code examples
parsinghaskellparsec

Parsing columns of data with parsec


I'm writing a parser to scan columns of numbers. like this :

T   LIST2   LIST3   LIST4
1   235 623 684
2   871 699 557
3   918 686 49
4   53  564 906
5   246 344 501
6   929 138 474

The first line contain the name of the lists and I would like my program to parse exactly the same number of data as in the title (to exclude arrays with incoherent number of titles or columns).

I wrote this program :

title = do
  tit <- many1 alphaNum
  return tit

digits = do
  dig <- many1 digit
  return dig

parseSeries = do 
    spaces
    titles <- title `sepBy` spaces
    let nb = length titles
    dat <- endBy (count (nb-1) (digits `sepBy` spaces)) endOfLine
    spaces
    return (titles,concat dat)

main = do
    fichier <- readFile ("test_list3.txt")
    putStrLn $ fichier
    case parse parseSeries "(stdin)" fichier of
            Left error -> do putStrLn "!!! Error !!!"
                             print error
            Right (tit,resu) -> do  
                                mapM_ putStrLn  tit
                                mapM_ putStrLn  (concat  resu)

but when I try to parse a file with this kind of data, I have the following error :

!!! Error !!!
"(stdin)" (line 26, column 1):
unexpected end of input
expecting space or letter or digit

I'm a newbie with parsing and I don't understand why it fail?

Do you have an idea of what is wrong with my parser ?


Solution

  • Your program is doing something different than what you expect. The key part is right here:

    parseSeries = do 
        spaces
        titles <- title `sepBy` spaces
        let nb = length titles
    
        -- The following is the incorrect part
        dat <- endBy (count (nb-1) (digits `sepBy` spaces)) endOfLine
        spaces
        return (titles,concat dat)
    

    I believe what you actually wanted was:

    parseSeries = do 
        spaces
        titles <- title `sepBy` spaces
        let nb = length titles
    
        let parseRow = do
                column  <- digits
                columns <- count (nb - 1) (spaces *> digits)
                newline
                return (column:columns)
        dat <- many parseRow
        return (titles, dat)