I am unable to write to the file which I create. In windows it's working fine. In centos it says file already exists and does not write anything.
File tempFile= new File("temp/tempfile.parquet");
tempFile.createNewFile();
parquetDataSet.write().parquet(tempFile.getAbsolutePath());
Following is the error: file already exists
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : Stack Trace: {}org.apache.spark.sql.AnalysisException: path file:/temp/myfile.parquet already exists.;
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : sparkcontext close
The default savemode in spark is ErrorIfExists. This means that if the file with the same filename you intend to write already exists, it will give an exception similar to the one you got above. This is happening in your case because you are creating the file yourself rather than leaving that task to spark. There are 2 ways in which you can resolve the situation:
1) You can either mention savemode as "overwrite" or "append" in the write command:
parquetDataSet.write.mode("overwrite").parquet(tempFile.getAbsolutePath());
2) Or, you can simply remove the create new file command and straightaway pass the destination path in your spark write command as follows:
parquetDataSet.write.parquet("temp/tempfile.parquet");