Search code examples
apache-sparkpysparkparquetapache-spark-sqlcsv

Pyspark - How can I convert parquet file to text file with delimiter


I have a parquet file with the following schema:

|DATE|ID|

I would like to convert it into a text file with tab delimiters as follows:

20170403 15284503

How can I do this in pyspark?


Solution

  • In Spark 2.0+

    spark.read.parquet(input_path)
    

    to read the parquet file into a dataframe. DataFrameReader

    spark.write.csv(output_path, sep='\t')
    

    to write the dataframe out as tab delimited. DataFrameWriter