Search code examples
apache-sparkconfluent-schema-registryspark-avro

Spark from_avro function with schema registry support


I am trying to use confluent schema reigstry with sparks from_avro function as per this doc.

I have the below imports:

"io.confluent" % "kafka-schema-registry-client" % "5.4.1",
"io.confluent" % "kafka-avro-serializer" % "5.4.1",
"org.apache.spark" %% "spark-avro" % "2.4.5",

However, I only see the below method signature available.

import org.apache.spark.sql.avro._
from_avro(data: Column, jsonFormatSchema : String)

and not the one I expect with schema registry support.

from_avro($"value", "topic-value", schemaRegistryAddr)

Am I missing something? I understood that 2.4.5 is the latest stable version for spark-avro, but it does not seem to support the same signature mentioned in the databricks docs. Inputs appreciated.


Solution

  • Below feature is not available in spark 2.4.5 as of now.

    from_avro($"value", "topic-value", schemaRegistryAddr)
    

    It is only available in Databricks environment or Databricks notebook.