Downloaded the .xpt format file from the URL to the blob container in Databricks - Python notebook.
In the below code - 'example.xpt' is the local file. How to read the .xpt format file from the blob container?
import xport.v56
with open('example.xpt', 'rb') as f:
library = xport.v56.load(f)
Appreciate any inputs. Thanks!
Considering you have already installed the library xport
in your cluster and mounted your ADLS blob container, you follow the steps given below:
.xpt
file present in your blob container.import xport.v56
with open('/dbfs/mnt/repro/ALQY_F.XPT', 'rb') as f:
# '/dbfs/mnt/repro/' refers to the mount point i.e., to ADLS blob container.
library = xport.v56.load(f)
The library object is of type class 'xport.v56.Library'
. library
has an attribute values
which returns an iterable object.
Use the following code to write the required data to csv
format in specified destination
for data in library.values():
print(type(data)) # <class 'xport.v56.Member'>
print(dir(data)) # use to check all the possible attributes that can be used on this object
data.to_csv("/dbfs/mnt/repro/op.csv") #writes as csv to your blob container.
Without Mounting:
spark.conf.set("fs.azure.account.auth.type.<adls_name>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<adls_name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<adls_name>.dfs.core.windows.net", client_id)
spark.conf.set("fs.azure.account.oauth2.client.secret.<adls_name>.dfs.core.windows.net", client_secret)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<adls_name>.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")
abfss
. Using abfss://[email protected]/example.xpt
directly with open()
does not work.dbutils.fs.cp()
to copy the file from abfss to DBFS.#copy from datalake to DBFS
dbutils.fs.cp("abfss://data@<adls_name>.dfs.core.windows.net/<file_name>","dbfs:/<required_path>")