Search code examples
scalaazureapache-sparkazure-databricksazure-data-lake-gen2

StatusDescription=This request is not authorized to perform this operation using this permission


I'm using azure databricks to create a simple batch to copy data from a databricks filesystem to another location.

as command in a cell, I passed this :

val df = spark.read.text("abfss://" + fileSystemName + "@" + storageAccountName + ".dfs.core.windows.net/fec78263-b86d-4531-ad9d-3139bf3aea31.txt")

where the source file name is : fec78263-b86d-4531-ad9d-3139bf3aea31.txt

but when running cmd, I receive this error message :

HEAD https://bassamsacc01.dfs.core.windows.net/bassamdatabricksfs01/fec78263-b86d-4531-ad9d-3139bf3aea31.txt?timeout=90    
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=
ErrorMessage=
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathProperties(AbfsClient.java:353)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:498)
        at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:405)
        at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
        at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:386)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:366)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:355)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:355)
        at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:927)
        at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:893)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:1)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:46)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw$$iw.<init>(command-3220206233807239:48)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw$$iw.<init>(command-3220206233807239:50)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw$$iw.<init>(command-3220206233807239:52)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$$iw.<init>(command-3220206233807239:54)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read.<init>(command-3220206233807239:56)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$.<init>(command-3220206233807239:60)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$read$.<clinit>(command-3220206233807239)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval$.$print$lzycompute(<notebook>:7)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval$.$print(<notebook>:6)
        at lineb130a1e5c98d4e8d87dcdb2af6c5332443.$eval.$print(<notebook>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
        at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
        at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
        at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
        at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
        at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
        at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
        at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
        at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
        at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
        at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
        at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
        at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
        at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
        at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
        at scala.util.Try$.apply(Try.scala:213)
        at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
        at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
        at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
        at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
        at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
        at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
        at java.lang.Thread.run(Thread.java:748)

At a first view, it seems that there is an authentication issue with access to filesystem hosted in azure account storage, but don't know how to add appropriate strings.


If the question helped, up-vote it. Thanks in advance.


Solution

  • From the error message, you didn't give the correct role to your service principal in the Data Lake Storage Gen2 scope.

    To fix the issue, navigate to the storage account in the portal -> Access control (IAM) -> add your service principal as a role e.g. Storage Blob Data Contributor like below.

    enter image description here

    For more details, refer to this doc - Create and grant permissions to service principal.