Search code examples
haskellparsec

Haskell: Parsec: Pipeline of transformers of the whole file


I'm trying to use parsec to read a C/C++/java source file and do a series of transformations on the entire file. The first phase removes strings and the second phase removes comments. (That's because you might get a /* inside a string.)

So each phase transforms a string onto Either String Error, and I want to bind (in the sense of Either) them together to make a pipeline of transformations of the whole file. This seems like a fairly general requirement.

import Text.ParserCombinators.Parsec

commentless, stringless :: Parser String

stringless = fmap concat ( (many (noneOf "\"")) `sepBy` quotedString ) 
quotedString = (char '"') >> (many quotedChar) >> (char '"')
quotedChar = try (string "\\\"" >> return '"' ) <|> (noneOf "\"")  

commentless = fmap concat $ notComment `sepBy` comment
notComment = manyTill anyChar (lookAhead (comment <|> eof))
comment = (string "//" >> manyTill anyChar newline >> spaces >> return ()) 
      <|> (string "/*" >> manyTill anyChar (string "*/") >>  spaces >> return ())


main =
    do c <- getContents
       case parse commentless "(stdin)" c of -- THIS WORKS
--     case parse stringless "(stdin)" c of -- THIS WORKS TOO    
--     case parse (stringless `THISISWHATIWANT` commentless) "(stdin)" c of 
            Left e -> do putStrLn "Error parsing input:"
                         print e
            Right r -> print r

So how can I do this? I tried parserBind but it didn't work.

(In case anybody cares why, I'm trying to do a kind of light parse where I just extract what I want but avoid parsing the entire grammar or even knowing whether it's C++ or Java. All I need to extract is the starting and ending line numbers of all classes and functions. So I envisage a bunch of preprocessing phases that just scrub out comments, #defines/ifdefs, template preambles and contents of parentheses (because of the semicolons in for clauses), then I'll parse for snippets preceding {s (or following }s because of typedefs) and stuff those snippets through yet another phase to get the type and name of whatever it is, then recurse to just the second level to get java member functions.)


Solution

  • You need to bind Either Error, not Parser. You need to move the bind outside the parse, and use multiple parses:

    parse stringless "(stdin)" input >>= parse commentless "(stdin)"
    

    There is probably a better approach than what you are using, but this will do what you want.