Benefit of using case class in spark dataframe

What is the advantage of using case class in spark dataframe? I can define the schema using "inferschema" option or define Structtype fields. I referred "https://docs.scala-lang.org/tour/case-classes.html" but could not understand what are the advantages of using case class apart from generating schema using reflection.

Solution

inferschema can be an expensive operation and will defer error behavior unnecessarily. consider the following pseudocode

val df = loadDFWithSchemaInference
//doing things that takes time
df.map(row => row.getAs[String]("fieldName")).//more stuff

now in your this code you already have the assumption baked in that fieldName is of type String but it's only expressed and ensured late in your processing leading to unfortunate errors in case it wasn't actually a String

now if you'd do this instead

val df = load.as[CaseClass]

val df = load.option("schema", predefinedSchema)

the fact that fieldName is a String will be a precondition and thus your code will be more robust and less error prone.

schema inference is very handy to have if you do explorative things in the REPL or e.g. Zeppelin but should not be used in operational code.

Edit Addendum: I personally prefer to use case classes over schemas because I prefer the Dataset API to the Dataframe API (which is Dataset[Row]) for similar robustness reasons.