Search code examples
scalacirce

Pattern matching json lines using Circe and filtering based upon decoded case class value


I have a very large file of json lines, which I intend to read into a list of case classes. Due to the size of the file, rather than reading the entire file into a variable first and then filtering, I would like to filter within the json decoding pattern matching. Currently the code looks like this:

import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
import io.circe.parser.decode

case class Person(name: String, age: Int, country: String)

val personList: List[Person] =
    Source.fromResource("Persons.json").getLines.toList.map { line =>
      implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]
      val decoded = decode[Person](line)
      decoded match {
        case Right(decodedJson) =>
          Person(
            decodedJson.name,
            decodedJson.age,
            decodedJson.country
          )
        case Left(ex) => throw new RuntimeException(ex)
      }
    }

however, if I wanted to only include Person instances with a country of "us", what would be the best way to accomplish this? Should I have nested pattern matching, that will specifically look for Person(_, _, "us") (im not sure how I would accomplish this), or is there some way I can implement Option handling?


Solution

  • You could do something like this:

    import io.circe.Decoder
    import io.circe.generic.semiauto.deriveDecoder
    import io.circe.parser.decode
    
    case class Person(name: String, age: Int, country: String)
    
    implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]
    
    val personList: List[Person] =
      Source
        .fromResource("Persons.json")
        .getLines
        .flatMap { line =>
          val decoded = decode[Person](line)
          decoded match {
            case Right(person @ Person(_, _, "us")) => Some(person)
            case Right(_)                           => None
            case Left(ex) =>
              println(s"couldn't decode: $line, will skip (error: ${ex.getMessage})")
              None
          }
        }
        .toList
    
    println(s"US people: $personList")
    

    A few things to note:

    • I moved the .toList to the end. In your implementation, you called it right after .getLines which kind of loses the lazyness of the whole thing. Assuming there's only a few US people out of huge number of people in the JSON file, this can be beneficial for performance & efficiency.
    • Wrapping each iteration's result in an Option along with flatMap over the original Iterator we're running upon is very helpful to get this kind collection filtering.
    • I didn't throw an exception upon an error, but rather logged it and moved on with a None. You could also accumulate errors and do whatever you want with them after all iterations are done, if that's helpful to you.
    • The @ in person @ Person(_, _, "us") can be used for something like "match & bind" upon the whole object in question.
    • As the comment to the original question noted - no need to re-instantiate the implicit Decoder upon each iteration. You can just pull it one layer up, as I did in my example.