Search code examples

Deleted google storage directory appears "already exists" when calling Spark DataFrame.saveAsParquetFile()

After I deleted a Google Cloud Storage directory through the Google Cloud Console, (the directory was generated by early Spark (ver 1.3.1) job), when re-run the job, it always fail and seemed the directory was still there to the job; I cannot find the directory with gsutil.

Is this a bug, or anything I missed? Thanks!

The error I got:

java.lang.RuntimeException: path gs://<my_bucket>/job_dir1/output_1.parquet already exists.
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:112)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
at org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:995)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


  • It appears you might be running into a known bug with the NFS list-consistency cache:

    It was fixed in the latest release, and if you upgrade by deploying a new cluster with bdutil-1.3.1 (announced here:!topic/gcp-hadoop-announce/vstNuV0LpDc) the problem should be fixed. If you need to upgrade in-place, you can try to download the latest gcs-connector-1.4.1 jarfile onto your master and worker nodes under /home/hadoop/hadoop-install/lib/gcs-connector-*.jar and then rebooting the Spark daemons:

    sudo sudo -u hadoop /home/hadoop/spark-install/sbin/
    sudo sudo -u hadoop /home/hadoop/spark-install/sbin/