Search code examples
azureapache-sparkavrospark-avro

Read Avro in Azure HDI4.0


I'm trying to read an Avro file using Jupyter notebook in Azure HDInsight 4.0 with Spark 2.4. I'm not able to provide properly the .jar file to

I've tried the approach suggested in How to use Avro on HDInsight Spark/Jupyter? and in https://learn.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages but I guess they are related to Spark 2.3

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}

This produce the error message:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'


Solution

  • The solution that seem to work is

    %%configure -f 
    { "conf": {"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.0" }}