Search code examples
apache-sparkapache-spark-sqlazure-databricks

Spark 2.4.5 from_csv function not found in databricks runtime 6.4


I get an error for from_csv function in org.apache.spark.sql.functions package for spark version 2.4.5 in databricks runtime cluster 6.4. I see the function is added a long back anyone who can tell me if I'm importing a wrong package or am i doing something wrong?


Solution

  • It is introduced in Spark 3.0.0, you can see the repo:

    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala

    /**
       * Parses a column containing a CSV string into a `StructType` with the specified schema.
       * Returns `null`, in the case of an unparseable string.
       *
       * @param e a string column containing CSV data.
       * @param schema the schema to use when parsing the CSV string
       * @param options options to control how the CSV is parsed. accepts the same options and the
       *                CSV data source.
       *
       * @group collection_funcs
       * @since 3.0.0
       */
      def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column = withExpr {
        CsvToStructs(schema, options, e.expr)
      }
    

    And JIRA ticket where it is included:

    https://issues.apache.org/jira/browse/SPARK-25393