Search code examples
pythonapache-sparkdatabricksazure-databricks

Problem with saving spark DataFrame as Parquet


I'm trying to save a DataFrame to a path as Parquet files. The issue is: the display() function shows a bunch of results in "Prop_0" but whenever I try to save them, only the first one gets converted and goes to the path.

The code I'm using is:

dbutils.fs.rm(Path_1, True)
avroFile = spark.read.format('com.databricks.spark.avro').load(Path_1)
avroFile.write.mode("overwrite").save(Path_2, format="parquet") 

Solution

  • This is expected behaviour, Hadoop File Format is used by Spark and this file format requires data to be partitioned - that's why you have part- files.

    I'm able to run the above code without any issue.

    enter image description here

    You may use the below method to save spark DataFrame as parquet files.

    enter image description here