My code to execute an sql query, convert it to pandas and then write to a csv file is per below but see errors when executing it.
src_query = """select * from table"""
df = spark.sql(src_query).toPandas()
df.write.csv('output.csv', index=False)
This is the error I see at the last line when executing it - "invalid syntax" Can anyone share any tips on how I can easily write the spark output to csv file?
This is pyspark
's syntax.
df.write.csv
However, you converted the dataframe into Pandas
dataframe with this.
df = spark.sql(src_query).toPandas()
Thus, df
is the pandas object and you need to use pandas syntax.
df.to_csv('output.csv', index=False)