Search code examples
rsparklyr

Overwrite a Spark DataFrame into location


I want to save my Spark DataFrame into directory using spark_write_* function like this:

spark_write_csv(df, "file:///home/me/dir/")

but if the directory is already there I will get error:

ERROR: org.apache.spark.sql.AnalysisException: path file:/home/me/dir/ already exists.;

When I'm working on the same data, I want to overwrite this dir - how can I achieve this? In documentation there is one parameter:

mode  Specifies the behavior when data or table already exists.

but it doesn't say what value you should use.


Solution

  • Parameter mode should simply have value "overwrite":

    spark_write_csv(df, "file:///home/me/dir/", mode = "overwrite")