I have text file similar to the below example
I am using the encoder ISO-8859-1
as and separator as þ
The raw data is something like this of name "test.txt"
idþnameþroleþ expþ task_descþ comp
1þJohn Doeþ"Senior Developerþ 4þ working on the PySpark project"þ Google
I need the data to look like this
id | name | role | exp | task_desc | comp |
---|---|---|---|---|---|
1 | John Doe | "Senior Developer | 4 | working on the PySpark project" |
I am using the below code to run the raw "test.txt" file
spark_df = spark.read.options( multiline='True', quote='"', escape='"', encoding='ISO-8859-1', mode='PERMISSIVE').csv('test.txt', header=True, sep='þ')
I have also used the below mentioned quote and escape characters.
quote="\"", escape="\""
Is there a solution to this problem in Pyspark?
You need to have quote=''
to make it work.