Search code examples
scalaapache-sparksparkcorespark-redshift

check whether is spark format exists or not


Context

Spark reader has the function format, which is used to specify a data source type, for example, JSON, CSV or third party com.databricks.spark.redshift

Help

how can I check whether a third-party format exists or not, let me give a case

  • In local spark, connect to redshift two open source libs available 1. com.databricks.spark.redshift 2. io.github.spark_redshift_community.spark.redshift, how I can determine which libs the user pastes in the classpath

What I tried

  • Class.forName("com.databricks.spark.redshift"), not worked
  • I tried to check spark code for how they are throwing error, here is line, but unfortunately Utils is not available publically
  • Instead of targeting format option, I tried to target JAR file System.getProperty("java.class.path")
  • spark.read.format("..").load() in try/catch

I looking for a proper & reliable solution


Solution

  • May this answer help you.

    To only check whether is spark format exists or not,

    spark.read.format("..").load() in try/catch

    is enough.

    And as all data sources usually register themselves using DataSourceRegister interface (and use shortName to provide their alias):

    You can use Java's ServiceLoader.load method to find all registered implementations of DataSourceRegister interface.

    import java.util.ServiceLoader
    import org.apache.spark.sql.sources.DataSourceRegister
    
    val formats = ServiceLoader.load(classOf[DataSourceRegister])
    
    import scala.collection.JavaConverters._
    formats.asScala.map(_.shortName).foreach(println)