I have a very large file of json lines, which I intend to read into a list of case classes. Due to the size of the file, rather than reading the entire file into a variable first and then filtering, I would like to filter within the json decoding pattern matching. Currently the code looks like this:
import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
import io.circe.parser.decode
case class Person(name: String, age: Int, country: String)
val personList: List[Person] =
Source.fromResource("Persons.json").getLines.toList.map { line =>
implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]
val decoded = decode[Person](line)
decoded match {
case Right(decodedJson) =>
Person(
decodedJson.name,
decodedJson.age,
decodedJson.country
)
case Left(ex) => throw new RuntimeException(ex)
}
}
however, if I wanted to only include Person instances with a country of "us", what would be the best way to accomplish this? Should I have nested pattern matching, that will specifically look for Person(_, _, "us") (im not sure how I would accomplish this), or is there some way I can implement Option handling?
You could do something like this:
import io.circe.Decoder
import io.circe.generic.semiauto.deriveDecoder
import io.circe.parser.decode
case class Person(name: String, age: Int, country: String)
implicit val jsonDecoder: Decoder[Person] = deriveDecoder[Person]
val personList: List[Person] =
Source
.fromResource("Persons.json")
.getLines
.flatMap { line =>
val decoded = decode[Person](line)
decoded match {
case Right(person @ Person(_, _, "us")) => Some(person)
case Right(_) => None
case Left(ex) =>
println(s"couldn't decode: $line, will skip (error: ${ex.getMessage})")
None
}
}
.toList
println(s"US people: $personList")
A few things to note:
.toList
to the end. In your implementation, you called it right after .getLines
which kind of loses the lazyness of the whole thing. Assuming there's only a few US people out of huge number of people in the JSON file, this can be beneficial for performance & efficiency.Option
along with flatMap
over the original Iterator
we're running upon is very helpful to get this kind collection filtering.None
. You could also accumulate errors and do whatever you want with them after all iterations are done, if that's helpful to you.@
in person @ Person(_, _, "us")
can be used for something like "match & bind" upon the whole object in question.Decoder
upon each iteration. You can just pull it one layer up, as I did in my example.