In Azure Synapse and with Pyspark, I am doing Data Profiling with ProfileReport (https://github.com/ydataai/ydata-profiling):
report = ProfileReport(dataframe
title="Profiling_pyspark_DataFrame",
infer_dtypes=False,
interactions=None,
missing_diagrams=None,
correlations={"auto": {"calculate": False},
"pearson": {"calculate": False},
"spearman": {"calculate": False}})
When I call report variable on a notebook cell, I see the HTML content I would like to save on ADLS.
Now I tried to save the HTML in the datalake with:
report.to_file("abfss://[email protected]/profile.html")
But I have the error:
FileNotFoundError: [Errno 2] No such file or directory: 'abfss:/[email protected]/profile.html'
Where am I wrong ? (I have a linked service between synapse and ADLS).
Yes, you are right. I am just adding it to answer with some other ways so that it would help the community.
mssparkutils.fs.put("abfss://[email protected]/synapse/report.html", profile.to_html(), True)
Output:
One more way is to save it in Synapse and copy or move to ADLS storage.
profile.to_file("/tmp/report2.html")
mssparkutils.fs.cp("file:/tmp/report2.html", "abfss://[email protected]/synapse/report2.html")
or
mssparkutils.fs.mv("file:/tmp/report2.html", "abfss://[email protected]/synapse/report3.html")
Output:
While accessing the local filesystem in Synapse, you need to prefix the path with file:/
.