Search code examples
scalaakka-stream

Combine delimiters to create stream of words from text file using Akka Streams


I have the next code that calculates words frequency in the text file:

implicit val system: ActorSystem = ActorSystem("words-count")
implicit val mat = ActorMaterializer()
implicit val ec: ExecutionContextExecutor = system.dispatcher

val sink = Sink.fold[Map[String, Int], String](Map.empty)({
    case (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
  })    

FileIO.fromPath(Paths.get("/file.txt"))
        .via(Framing.delimiter(ByteString(" "), 256, true).map(_.utf8String))
        .toMat(sink)((_, right) => right)
        .run()
        .map(println(_))
        .onComplete(_ => system.terminate())

Currently, it uses space as a delimiter but ignores line breaks ("\n"). Can I use both spaces and line breaks as delimiters in the same stream, i.e. is there a way to combine them?


Solution

  • You could setup delimiter as \n and then split lines by space using flatMapConcat:

    FileIO
        .fromPath(Paths.get("file.txt"))
        .via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String))
        .flatMapConcat(s => Source(s.split(" ").toList)) //split line by space 
        .toMat(sink)((_, right) => right)
        .run()
        .map(println(_))
        .onComplete(_ => system.terminate())