I have a case class like below:
case class Person(id:Int,name: String)
Now, i wrote the below method to make a parquet file from a Seq[T] using AvroParquetWriter.
def writeToFile[T](data: Iterable[T], schema: Schema, path: String, accessKey: String, secretKey: String): Unit = {
val conf = new Configuration
conf.set("fs.s3.awsAccessKeyId", accessKey)
conf.set("fs.s3.awsSecretAccessKey", secretKey)
val s3Path = new Path(path)
val writer = AvroParquetWriter.builder[T](s3Path)
.withConf(conf)
.withSchema(schema)
.withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
.build()
.asInstanceOf[ParquetWriter[T]]
data.foreach(writer.write)
writer.close()
}
The schema is:
val schema = SchemaBuilder
.record("Person")
.fields()
.requiredInt("id")
.requiredString("name")
.endRecord()
Now, when i call writeToFile with below code, i got exception:
val personData = Seq(Person(1,"A"),Person(2,"B"))
ParquetService.writeToFile[Person](
data = personData,
schema = schema,
path = s3Path,
accessKey = accessKey,
secretKey = secretKey
java.lang.ClassCastException:com.entities.Person cannot be cast to org.apache.avro.generic.IndexedRecord
Why Person can not be casted to IndexedRecord? Is there anything extra i need to do to get rid of this exception?
I had a similar issue and according to this example
you are missing one method call on writer builder.
val writer = AvroParquetWriter.builder[T](s3Path)
.withConf(conf)
.withSchema(schema)
.withDataModel(ReflectData.get) //This one
.withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
.build()
Also, if you wish to support nulls in your data, you can use ReflectData.AllowNull.get()