Search code examples
pythonpandaspysparkexport-to-csv

Write pyspark sql query output to csv file


My code to execute an sql query, convert it to pandas and then write to a csv file is per below but see errors when executing it.

src_query = """select * from table"""

df = spark.sql(src_query).toPandas()

df.write.csv('output.csv', index=False)

This is the error I see at the last line when executing it - "invalid syntax" Can anyone share any tips on how I can easily write the spark output to csv file?


Solution

  • This is pyspark's syntax.

    df.write.csv
    

    However, you converted the dataframe into Pandas dataframe with this.

    df = spark.sql(src_query).toPandas()
    

    Thus, df is the pandas object and you need to use pandas syntax.

    df.to_csv('output.csv', index=False)