Search code examples
scalaakkaakka-streamreactive-streams

akka stream design pattern for files consumption


I have a problem where I am asked to use akka streams to design a search API to look for data into several related .tsv files.
For ex you have 2 files:
movies.tsv (id, title)
actors.tsv (name, movieIds)
Say you want to create an endpoint listing all the actors that played in one movie just specifying the name
def principalsForMovieName(name: String): Source[Actor, _]
you would have to read the first file to get all the movie ids containing the input name and then process the second file to list the related actors.
I thought I could to that by piping 2 Sources (first movies then actors) together but that does not appear like something common with akka reactive streams.
I might have missed something in the whole stream concept I guess. Could you point me in the right direction please?


Solution

  • This is workable, albeit inefficient if multiple movies happen to share a title:

    • read a stream of lines from movies.tsv
    • filter the stream for titles matching the name of the movie and map to the movie IDs
    • for each movie ID, emit a stream of lines from actors.tsv (flatMapConcat is probably the stream operator of interest here)
    • filter the stream for records matching that movie ID
    • map each record to the actor name

    The inefficiency arises from repeatedly re-reading and scanning actors.tsv.