python python-3.x csv apache-spark pyspark

Reading text file in Pyspark with delimiters present within double quotes

I have text file similar to the below example I am using the encoder ISO-8859-1 as and separator as þ

The raw data is something like this of name "test.txt"

idþnameþroleþ expþ task_descþ comp

1þJohn Doeþ"Senior Developerþ 4þ working on the PySpark project"þ Google

I need the data to look like this

id	name	role	exp	task_desc	comp
1	John Doe	"Senior Developer	4	working on the PySpark project"	Google

I am using the below code to run the raw "test.txt" file

spark_df = spark.read.options( multiline='True', quote='"', escape='"', encoding='ISO-8859-1', mode='PERMISSIVE').csv('test.txt', header=True, sep='þ')

I have also used the below mentioned quote and escape characters.

quote="\"", escape="\""

Is there a solution to this problem in Pyspark?

Solution

You need to have quote='' to make it work.

Change value in Ini to empty using Python configparser
Script works with error line but won't run corectly when bad line is removed
FFMPEG not saving logs when converting to audio format
Numerically obtaining response of a damped driven oscillator gives peak at wrong frequency
What is the deal with the pony in Python community?
'Area' object is not callable
ValueError: Attribute Users.request is required
How to place class in its own file when it appears to be inheriting from an instance?
Python 3.6 with pony 0.7 gives error on commit to oracle db
Why pyqt tablewigdet is only displaying row number and no data?
How to check if a value exists in a dictionary?
OpenCV: draw a rectangle around a region
Python, Tkinter, trying to pull random numbers from a list based off user input for number and have results open in mew window
How to remove repeated elements in a vector, similar to 'set' in Python
Understanding descriptor protocol for 'wrapper-descriptor' itself
Passing a NumPy 3d array to a C function with a triple pointer as an argument
How do I create variable variables?
Turn a tf.data.Dataset to a jax.numpy iterator
Django 5.1 + Postgresql (debian server)
Passing in arguments to Dependency function makes it recognized as a Query Parameter
How to pass parameters to an endpoint using `add_route()` in FastAPI?
How to make Depends optional in FastAPI?
Streaming multiple videos through FastAPI to Web Browser causes HTTP requests to stall
FastAPI error when handling file together with form-data defined in a Pydantic model
Use serial port in Python without installing external packages
How can i get the output to print 10 circles instead of 9 in python turtle module
Unable to install TA-Lib on Ubuntu
Unable to install Python without sudo access
Size of an open file object
two if or more with one ELSE