Search code examples
scalaapache-sparkpysparkdatabricksazure-databricks

Error: bulkCopyToSqlDB is not a member of org.apache.spark.sql.DataFrameWriter


Using the following link to load data to SQL DB from Databricks I'm getting the following error:

command-3227900948916301:23: error: value bulkCopyToSqlDB is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] df.write.mode(SaveMode.Overwrite).bulkCopyToSqlDB(bulkCopyConfig)

https://github.com/Azure/azure-sqldb-spark/blob/fa1cf19ed797648a20d9b7f474d7c2cd88829ada/samples/scripts/BulkCopySample.scala

My code is as follows:

val bulkCopyConfig  = Config(Map(
  "url"          -> url,
  "databaseName" -> databaseName,
  "dbTable"      -> "dbo.xxxx",
  "user"         -> user,
  "password"     -> password,
  "connectTimeout" -> "120",
  "bulkCopyBatchSize" -> "100000",
  "bulkCopyTableLock" -> "true",
  "bulkCopyTimeout"   -> "0",
  "truncate" -> "true"
//  "queryTimeout"   -> "5"
))

//~

df. write.mode(SaveMode.Overwrite).bulkCopyToSqlDB(bulkCopyConfig)

Any thoughts on why I'm getting the error?

Solution

  • You need to have correct imports to extend the DataFrame with additional functions:

    import com.microsoft.azure.sqldb.spark.connect._
    

    here is what I get (it fails because I don't have active SQL DB, but it found the function):

    scala> import com.microsoft.azure.sqldb.spark.config.Config
    import com.microsoft.azure.sqldb.spark.config.Config
    
    scala> import com.microsoft.azure.sqldb.spark.connect._
    import com.microsoft.azure.sqldb.spark.connect._
    
    scala> val bulkCopyConfig = Config(Map(
         |   "url"               -> "mysqlserver.database.windows.net",
         |   "databaseName"      -> "MyDatabase",
         |   "user"              -> "username",
         |   "password"          -> "*********",
         |   "databaseName"      -> "MyDatabase",
         |   "dbTable"           -> "dbo.Clients",
         |   "bulkCopyBatchSize" -> "2500",
         |   "bulkCopyTableLock" -> "true",
         |   "bulkCopyTimeout"   -> "600"
         | ))
    bulkCopyConfig: com.microsoft.azure.sqldb.spark.config.Config = com.microsoft.azure.sqldb.spark.config.ConfigBuilder$$anon$1@754443e1
    
    scala> val df = spark.range(1,10)
    df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
    
    scala> df.bulkCopyToSqlDB(bulkCopyConfig)
    20/10/06 16:24:38 ERROR DataFrameFunctions: Connection cannot be established to the database
    com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'username'. ClientConnectionId:b35aa6bb-24ec-4727-98b6-074cae4e6a32
    ...