Search code examples
azure-databricks

Azure Databricks - Failed to find data source: couldFiles


I am new to autoLoader and trying to run below autoLoader code in notebook.

spark.readStream.format("couldFiles")\
  .option("cloudFiles.format","csv")\
  .load("dbfs:/FileStore/tables/test*.csv") \
  .writeStream

But got below error.

java.lang.ClassNotFoundException: Failed to find data source: couldFiles. Please find packages at http://spark.apache.org/third-party-projects.html

Can anyone please help advice?


Solution

  • java.lang.ClassNotFoundException: Failed to find data source: couldFiles.

    The above error happening because of cloudFiles.Configure cloudFiles accordingly as shown in the below code:

    cloudFiles ={
        "cloudFiles.subscriptionId" :"<subscription_Id>",
        "cloudFiles.connectionString" :"<connectionString_Storage_account>",
        "cloudFiles.format":"csv",
        "cloudFiles.tenantId":"<tenantId>",
        "cloudFiles.clientId":"<client_ID>",
        "cloudFiles.clientSecret":"<Client_Secret>",
        "cloudFiles.resourceGroup":"<Resource_group_name>",
        "cloudFiles.useNotifications":"yes"
    }
    

    For more information Configuring Auto Loader in Azure Databricks follow this link, it has a detailed explanation about read and write streaming data on the Azure Databricks.