Search code examples

While writing to hdfs path getting error Failed to rename

I am using spark-sql-2.4.1v which is using hadoop-2.6.5.jar version . I need to save my data first on hdfs and move to cassandra later. Hence I am trying to save the data on hdfs as below:

String hdfsPath = "/user/order_items/";

givenItemList.parallelStream().forEach( item -> {   
    String query = "select $item  as itemCol , avg($item) as mean groupBy year";
    Dataset<Row> resultDs = sparkSession.sql(query);

    saveDsToHdfs(hdfsPath, resultDs );   

public static void saveDsToHdfs(String parquet_file, Dataset<Row> df) {
      .save(parquet_file);" Saved parquet file :   " + parquet_file + "successfully");

When I run my job on cluster it fails throwing this error: Failed to rename FileStatus{path=hdfs:/user/order_items/_temporary/0/_temporary/attempt_20180626192453_0003_m_000007_59/part-00007.parquet; isDirectory=false; length=952309; replication=1; blocksize=67108864; modification_time=1530041098000; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false} to hdfs:/user/order_items/part-00007.parquet
    at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(

Please suggest how to fix this issue?


  • You can do all the selects in one single job, get all the selects and union in a single table.

    Dataset<Row> resultDs = givenItemList.parallelStream().map( item -> {   
        String query = "select $item  as itemCol , avg($item) as mean groupBy year";
        return sparkSession.sql(query);
    }).reduce((a, b) -> a.union(b)).get
    saveDsToHdfs(hdfsPath, resultDs );