Search code examples
eclipsescalasbtdatabricks-connect

How to fix spark.read.format("parquet") error


I'm running Scala code on Azure databricks well. Now I want to move this code from Azure notebook to eclipse.

  1. I install databricks connection following Microsoft document successfully. Pass databricks data connection test.
  2. I also installed SBT and import to my project in eclipse
  3. I create scala object in eclipse and also I import all jar files as external file in pyspark

package Student

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.SparkSession
import java.util.Properties
//import com.databricks.dbutils_v1.DBUtilsHolder.dbutils

object Test {
  
  def isTypeSame(df: DataFrame, name: String, coltype: String) = (df.schema(name).dataType.toString == coltype)
  def main(args: Array[String]){
    var Result = true
    val Borrowers = List(("col1", "StringType"),("col2", "StringType"),("col3", "DecimalType(38,18)"))
    val dfPcllcus22 = spark.read.format("parquet").load("/mnt/slraw/ServiceCenter=*******.parquet")
    
    if (Result == false) println("Test Fail, Please check") else println("Test Pass")  
  }
}

When I run this code in eclipse, it shows cannot find main class. But if I comment "val dfPcllcus22 = spark.read.format("parquet").load("/mnt/slraw/ServiceCenter=*******.parquet")", pass the test. So it seems spark.read.format cannot be recognized.

I'm new to Scala and DataBricks. I was researching result for several days but still cannot solve it. If anyone can help, really appreciate. Environment is a bit complicated to me, if more information required, please let me know


Solution

  • SparkSession is needed to run your code in eclipse, since your provided code does not have this line for SparkSession creation leads to an error,

    val spark = SparkSession.builder.appName("SparkDBFSParquet").master("local[*]".getOrCreate()
    

    Please add this line and run the code and it should work.