csv apache-spark apache-spark-sql spark-csv

Saving CSV file with partitionBy in Spark

I'm trying to save a dataframe as CSV file partitioned by a column.

val schema = new StructType(
      Array(
        StructField("ID",IntegerType,true),
        StructField("State",StringType,true),
        StructField("Age",IntegerType,true)
      )
)

val df = sqlContext.read.format("com.databricks.spark.csv")
        .options(Map("path" -> filePath).schema(schema).load()

df.write.partitionBy("State").format("com.databricks.spark.csv").save(outputPath)

But the output is not saved with any partition info. It looks like partitionBy was completely ignored. There were no errors. It works if I try the same with parquet format.

df.write.partitionBy("State").parquet(outputPath)

What am I missing here?

Solution

partitionBy support has to be implemented as a part of a given data source and as for now (v1.3) is not supported in Spark CSV. See: https://github.com/databricks/spark-csv/issues/123

Viewing a very large CSV file?
Powershell ConvertFrom-JSON to csv file(s)
Getting "An error occurred while calling o58.csv" error while writing a spark dataframe into a csv file
How can I turn a DataTable to a CSV?
Populate a 2d array with qualifying data from a multi-line txt file containing comma-separated values
How to retrieve values form csv file in Netlogo?
How to write HashMap to CSV?
How to export SSRS report directly to csv without rendering
PHP league/csv Reader How to know which delimiter was used?
How import also column name in CSV file
Export .csv file (with headers) from PostgreSQL table using PHP PDO, first data row missing
Replace commas not inside single quotes with an @ symbol
Split string on commas not inside double quotes
I keep on getting a KeyError in Python
Split a CSV where some entries have double quotes
PostgreSQL COPY command from csv file problem for array data
Rearrange columns using cut
Convert a .sav file to .csv file in Python
Comparing two csv files by column and value and displaying line numbers of differing values
Adapt existing script for converting csv to html table to include <tfoot>
Is there any free tool to convert a file with more than 65000 registers from DBF format to CSV?
Parsing a CSV file using gawk
How to concatenate text from multiple rows into a single text string in SQL Server
How to copy all files with specific extension and rename them adding the subdirectory names
interpret hex as numbers, when importing csv into sqlite3
Modifing a Pandas Dataframe using Pivot Tables or Group By
polars.read_csv() with german number formatting
convert a fixed width file from text to csv
ANGULAR 5 : how to export data to csv file
Parsing CSV file in Swift