Search code examples
apache-sparkpysparkazure-databricks

How To Migrate Spark Scala Azure Mounting Code into Pyspark Code


I would like to migrate below spark scala code into databricks pyspark supported to check if my container already exists in or else I have mount the container, which we planned to keep it centralized for all the team members to use this code. kindly help us.

principal
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> dbutils.secrets.get(scope = "sample-scope", key = "sample--client-id"),
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "sample-scope", key = "sample-client-secret"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/33333df5c2-953a-444444444/oauth2/token"
)

val adlsPath = "abfss://[email protected]/"
val mountPoint = "/mnt/containername"

if (dbutils.fs.mounts.map(mnt => mnt.mountPoint).contains(mountPoint)) {
  println(mountPoint + " already mounted")
}
else {
  println(mountPoint + " not mounted, mounting now")
  try {
    dbutils.fs.mount(
      source = adlsPath,
      mountPoint = mountPoint,
      extraConfigs = configs)
  }
  catch {
    case e: java.rmi.RemoteException => {
      println("exception encountered while mounting " + adlsPath)
    }
  }
}

Solution

  • You can use the Python code below.

    # configs1 = {
    #     "fs.azure.account.auth.type": "OAuth",
    #     "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    #     "fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope="sample-scope", key="sample-client-id"),
    #     "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="sample-scope", key="sample-client-secret"),
    #     "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/33333df5c2-953a-444444444/oauth2/token"
    # }
    
    configs2 = {
        "fs.azure.account.key.jadls2.blob.core.windows.net": "acc_key"
    }
    
    adls_path = "wasbs://[email protected]/"
    mount_point = "/mnt/jadls2"
    
    if any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
        print(mount_point + " already mounted")
    else:
        print(mount_point + " not mounted, mounting now")
        try:
            dbutils.fs.mount(
                source=adls_path,
                mount_point=mount_point,
                extra_configs=configs2
            )
        except Exception as e:
            if "RemoteException" in str(e):
                print("Exception encountered while mounting " + adls_path)
                
    dbutils.fs.ls(mount_point)
    

    Here, I tried with an account key, but that's not recommended; you should use a service principal.

    Output:

    enter image description here

    Refer to this article for more information.

    Mount ADLS Gen2 or Blob Storage in Azure Databricks (microsoft.com)