Parse array of numbers between emptylines

I'm trying to make a parser to scan arrays of numbers separated by empty lines in a text file.

1   235 623 684
2   871 699 557
3   918 686 49
4   53  564 906


1   154
2   321
3   519

1   235 623 684
2   871 699 557
3   918 686 49

Here is the full text file

I wrote the following parser with parsec :

import Text.ParserCombinators.Parsec

emptyLine = do
  spaces
  newline

emptyLines = many1 emptyLine

data1 = do
  dat <- many1 digit
  return (dat)

datan = do
  many1 (oneOf " \t")
  dat <- many1 digit
  return (dat)

dataline  = do
  dat1 <- data1
  dat2 <- many datan
  many (oneOf " \t")
  newline
  return (dat1:dat2)

parseSeries = do 
    dat <- many1 dataline  
    return dat

parseParag =  try parseSeries

parseListing = do 
    --cont <- parseSeries `sepBy` emptyLines
    cont <- between emptyLines emptyLines parseSeries
    eof
    return cont

main = do
    fichier <- readFile ("test_listtst.txt")
    case parse parseListing "(test)" fichier of
            Left error -> do putStrLn "!!! Error !!!"
                             print error
            Right serie -> do  
                                mapM_ print serie

but it fails with the following error :

!!! Error !!!
"(test)" (line 6, column 1):
unexpected "1"
expecting space or new-line

and I don't understand why.

Do you have any idea of what's wrong with my parser ?

Do you have an example on how to parse a structured bunch of data separated by empty lines ?

Solution

Do you have any idea of what's wrong with my parser ?

A few things:

As other answerers have already pointed out, the spaces parser is designed to consume a sequence of characters that satisfy Data.Char.isSpace; the newline ('\n') is such a character. Therefore, your emptyLine parser always fails, because newline expects a newline character that has already been consumed.
You probably shouldn't use the newline parser in your "line" parsers anyway, because those parsers will fail on the last line of the file if the latter doesn't end with a newline.
Why not use parsec 3 (Text.Parsec.*) rather than parsec 2 (Text.ParserCombinators.*)?
Why not parse the numbers as Integers or Ints as you go, rather than keep them as Strings?
Personal preference, but you rely too much on the do notation for my taste, to the detriment of readability. For instance,
```
data1 = do
  dat <- many1 digit
  return (dat)
```
can be simplified to
```
data1 = many1 digit
```
You would do well to add a type signature to all your top-level bindings.
Be consistent in how you name your parsers: why "parseListing" instead of simply "listing"?
Have you considered using a different type of input stream (e.g. Text) for better performance?

Do you have an example on how to parse a structured bunch of data separated by empty lines ?

Below is a much simplified version of the kind of parser you want. Note that the input is not supposed to begin with (but may end with) empty lines, and "data lines" are not supposed to contain leading spaces, but may contain trailing spaces (in the sense of the spaces parser).

module Main where

import Data.Char ( isSpace )
import Text.Parsec
import Text.Parsec.String ( Parser )

eolChar :: Char
eolChar = '\n'

eol :: Parser Char
eol = char eolChar

whitespace :: Parser String
whitespace = many $ satisfy $ \c -> isSpace c && c /= eolChar

emptyLine :: Parser String
emptyLine = whitespace

emptyLines :: Parser [String]
emptyLines = sepEndBy1 emptyLine eol

cell :: Parser Integer
cell = read <$> many1 digit

dataLine :: Parser [Integer]
dataLine = sepEndBy1 cell whitespace
--             ^
-- replace by endBy1 if no trailing whitespace is allowed in a "data line"

dataLines :: Parser [[Integer]]
dataLines = sepEndBy1 dataLine eol

listing :: Parser [[[Integer]]]
listing = sepEndBy dataLines emptyLines

main :: IO ()
main = do
    fichier <- readFile ("test_listtst.txt")
    case parse listing "(test)" fichier of
        Left error  -> putStrLn "!!! Error !!!"
        Right serie -> mapM_ print serie

Test:

λ> main
[[1,235,623,684],[2,871,699,557],[3,918,686,49],[4,53,564,906]]
[[1,154],[2,321],[3,519]]
[[1,235,623,684],[2,871,699,557],[3,918,686,49]]