Search code examples
parsingscalacombinators

Preprocessing Scala parser Reader input


I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.

I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:

Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)

How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.

Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?


Solution

  • The simplest way to do this is to use the fromLines method on PagedSeq:

    import scala.collection.immutable.PagedSeq
    import scala.io.Source
    import scala.util.parsing.input.PagedSeqReader
    
    val lines = Source.fromFile("file.txt").getLines.map(
      _.replaceAll("#.*", "").trim
    ).filterNot(_.isEmpty)
    
    val reader = new PagedSeqReader(PagedSeq.fromLines(lines))
    

    And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.