I want to read a JSON file in PySpark, but the JSON file is in this format (without comma and square brackets):
{"id": 1, "name": "jhon"}
{"id": 2, "name": "bryan"}
{"id": 3, "name": "jane"}
Is there an easy way to read this JSON in PySpark?
I have already tried this code:
df= spark.read.option("multiline", "true").json("data.json")
df.write.parquet("data.parquet")
But it doesn't work: in parquet file just the first line appears.
I just want to read this JSON file and save as parquet...
Only the first line appears while reading data from your mentioned file because of multiline
parameter is set as True
but in this case one line is a JSON object. So if you set multiline
parameter as False
it will work as expected.
df= spark.read.option("multiline", "false").json("data.json")
df.show()
In case if your JSON file would have had a JSON array in file like
[
{"id": 1, "name": "jhon"},
{"id": 2, "name": "bryan"},
{"id": 3, "name": "jane"}
]
or
[
{
"id": 1,
"name": "jhon"
},
{
"id": 2,
"name": "bryan"
}
]
multiline
parameter set to True
will work.