I want to parse following text:
keyword some more values
funcKeyw funcName1
funcKeyw funcName2
funcKeyw funcName3
keyword some more values
funcKeyw funcName2
keyword some more values
funcKeyw funcName4
Indentation is done by tabs. Each block is started by keyword
and some additional values in the same line. Everything indented belongs to the same block. After all the function calls (that start with the funcKeyw
keyword) there can be child keyword
blocks (separated by an "empty" line; "empty" means either nothing in it or whitespace characters).
type IndentLevel = Int
data Block = Block { blockFuncCalls :: [String]
, blockBlocks :: [Block]
}
block :: GenParser Char st Block
block = parseBlock 0
where
parseBlock lvl = do
count lvl tab
string "keyword"
-- [...] Parse other stuff in that line.
newline
-- Parse 'function calls'.
fs <- sepBy1 (blockFunc (lvl + 1)) emptyLines
-- Parse optional child blocks.
emptyLines
bs <- sepBy (parseBlock (lvl + 1)) emptyLines
return Block { blockFuncCalls=fs
, blockBlocks=bs
}
blockFunc :: IndentLevel -> GenParser Char st String
blockFunc lvl = do
count lvl tab
string "funcKeyw"
-- [...] Parse function name etc..
newline
return funcName -- Parsed func name.
emptyLine :: GenParser Char st ()
emptyLine = many (oneOf "\t ") >> newline >> return ()
emptyLines :: GenParser Char st ()
emptyLines = many emptyLine >> return ()
The problem is that the blockFunc
parser does not stop parsing when the child block starts, but returns an error unexpected 'keyword'
.
How can I avoid that? I think I could use try
or choice
to choose the correct parser for each line, but I want to require the function calls to be before the child blocks.
One thing I have noticed is that the sepBy
combinators have somewhat unexpected behaviour, namely, that if the separator is began to be parsed, and that fails, the entire sepBy
fails, instead of simple returning what was parsed so far. You can use the following variants, which differ by an additional try
inside of sepBy1Try
:
sepBy1Try :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a]
sepBy1Try p sep = do
x <- p
xs <- many (try $ sep *> p)
return (x:xs)
sepByTry p sep = sepBy1Try p sep <|> return []
Use these in place of sepBy
:
block :: GenParser Char st Block
block = parseBlock 0
where
parseBlock lvl = do
count lvl tab
string "keyword"
otherStuff <- many (noneOf "\r\n")
newline
-- Parse 'function calls'.
fs <- sepBy1Try (blockFunc (lvl + 1)) emptyLines
-- Parse optional child blocks.
emptyLines
bs <- sepByTry (try $ parseBlock (lvl + 1)) emptyLines
return Block { blockFuncCalls=fs
, blockBlocks=bs
, blockValues=words otherStuff
}
I also modified your datatype to also capture some more information (just for demonstration purposes). Also, note yet another try
in front of the recursive parseBlock
- this is because this parses must fail without consuming input, when it sees for example one tab, but is expecting two, this try
allows it to backtrack to the "next level".
Finally, change the following:
emptyLines :: GenParser Char st ()
emptyLines = many (try emptyLine) >> return ()
Same reasoning as sepBy here...
A test with a simple pretty printer for clarity:
data Block = Block { blockValues :: [String]
, blockFuncCalls :: [String]
, blockBlocks :: [Block]
} deriving (Show, Eq)
pprBlock :: Block -> String
pprBlock = unlines . go id where
go ii (Block vals funcs subblocks) =
let ii' = ii . ('\t':) in
(ii $ unwords $ "keyword":vals) :
map (\f -> ii' $ "function " ++ f) funcs ++
concatMap (go ii') subblocks
test0_run = either (error.show) (putStrLn.pprBlock) $ parse block "" $ test0
test0 = unlines $
[ "keyword some more values"
, "\tfuncKeyw funcName1"
, "\tfuncKeyw funcName2"
, "\t"
, "\tfuncKeyw funcName3"
, "\t"
, "\tkeyword some more values"
, "\t\tfuncKeyw funcName2"
, ""
, "\tkeyword some more values"
, "\t\tfuncKeyw funcName4"
]
and
>test0_run
keyword some more values
function funcName1
function funcName2
function funcName3
keyword some more values
function funcName2
keyword some more values
function funcName4
>putStrLn test0
keyword some more values
funcKeyw funcName1
funcKeyw funcName2
funcKeyw funcName3
keyword some more values
funcKeyw funcName2
keyword some more values
funcKeyw funcName4
>