Search code examples
scalaparser-combinators

Scala rep separator for specific area of text


Imaging i've got following:

--open
Client: enter
Nick
Age 28
Rosewell, USA

Client: enter
Maria
Age 19
Cleveland, USA
--open--

I need a result close to the following: List(List(Nick, Age 28, Rosewell), List(Maria, Age19, Cleveland))

It can be as many clients inside open body as you can imagine, so the list can have any size, it's not fixed.

I was trying to make with the help of following:

repsep(".*".r , "Client: enter" + lineSeparator)

In this case all i can parse it this line List((Client: enter)), how to make sure that you work with the same piece of parse text?


Solution

  • I guess you are using the RegexParsers (just note that it skips white spaces by default). I'm assuming that it ends with "\n\n--open--" instead (if you can change that otherwise I'll show you how to modify the repsep parser). With this change we see that the text has the following structure:

    • each client is separated by the text "Client: enter"
    • then you need to parse each line after that is non-empty, separated by a carriage return
    • if you have an empty line, parse the two line separators and repeat step 2 if possible otherwise it means that we reach the end of the input


    Then the implementation of the parser is straightforward:

    object ClientParser extends RegexParsers {
    
      override def skipWhitespace = false
    
      def lineSeparator = "\n"
      def root = "--open" ~> lineSeparator ~> rep(client) <~ "--open--"
      def client = ("Client: enter" ~ lineSeparator) ~> repsep(".+".r, lineSeparator) <~ rep(lineSeparator)
    }
    

    Running it with:

    --open
    Client: enter
    Nick
    Age 28
    Rosewell; USA
    
    Client: enter
    Maria
    Age 19
    Cleveland; USA
    
    --open--
    

    You get:

    [12.9] parsed: List(List(Nick, Age 28, Rosewell; USA), List(Maria, Age 19, Cleveland; USA))