Search code examples
razureazure-storagesparkr

Access Azure blob storage from R notebook


in python this is how I would access a csv from Azure blobs

storage_account_name = "testname"
storage_account_access_key = "..."
file_location = "wasb://[email protected]/testfile.csv"

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

df = spark.read.format('csv').load(file_location, header = True, inferSchema = True)

How can I do this in R? I cannot find any documentation...


Solution

  • The AzureStor package provides an R interface to Azure storage, including files, blobs and ADLSgen2.

    endp <- storage_endpoint("https://acctname.blob.core.windows.net", key="access_key")
    cont <- storage_container(endp, "mycontainer")
    storage_download(cont, "myblob.csv", "local_filename.csv")
    

    Note that this will download to a file in local storage. From there, you can ingest into Spark using standard Sparklyr methods.

    Disclaimer: I'm the author of AzureStor.