Search code examples
apache-sparkapache-spark-sqlspark-avro

Spark DataFrame: How to specify schema when writing as Avro


I want to write a DataFrame in Avro format using a provided Avro schema rather than Spark's auto-generated schema. How can I tell Spark to use my custom schema on write?


Solution

  • After applying the patch in https://github.com/databricks/spark-avro/pull/222/, I was able to specify a schema on write as follows:

    df.write.option("forceSchema", myCustomSchemaString).avro("/path/to/outputDir")