Search code examples
apache-sparkdatabricksdelta-lakeaws-databricks

poetry publish from codebuild to aws codeartifact fails with UploadError


I have a dataset I need to periodically import to my datalake, replacing current dataset After I produce a dataframe I currently do:

df.write.format("delta").save("dbfs:/mnt/defaultDatalake/datasets/datasources")

But if I run the job again I get the following error:

AnalysisException: dbfs:/mnt/defaultDatalake/datasets/insights/datasources already exists.;

While I know I can do a dbutils.fs.rm before, I'd rather just "replace" the data there Is there a way to achieve this?


Solution

  • use the overwrite mode:

    df.write.format("delta").mode("overwrite").save(....)
    

    If new dataframe has different schema, then you may need to add .option("overwriteSchema", "true") as well (see this blog post for more information)