I have a dataset I need to periodically import to my datalake, replacing current dataset After I produce a dataframe I currently do:
df.write.format("delta").save("dbfs:/mnt/defaultDatalake/datasets/datasources")
But if I run the job again I get the following error:
AnalysisException: dbfs:/mnt/defaultDatalake/datasets/insights/datasources already exists.;
While I know I can do a dbutils.fs.rm
before, I'd rather just "replace" the data there
Is there a way to achieve this?
use the overwrite mode:
df.write.format("delta").mode("overwrite").save(....)
If new dataframe has different schema, then you may need to add .option("overwriteSchema", "true")
as well (see this blog post for more information)