I have 2 JSON files, one is like:
{
"a":{
"a1":"xxx"
},
"b":"xxx"
}
Another one is like:
{
"a":{
"a1":"xxx",
"a2":"xxx"
},
"b":"xxx"
}
And I want to read these two JSON files into one Dataframe in Spark. I tried to use union
and unionByName
but they didn't work. How can I achieve this?
Spark can take care of merging the schema. See the following code:
>>> spark.read.option("multiLine", True).json("test-jsons/*").printSchema()
root
|-- a: struct (nullable = true)
| |-- a1: string (nullable = true)
| |-- a2: string (nullable = true)
|-- b: string (nullable = true)
>>> spark.read.option("multiLine", True).json("test-jsons/*").show()
+-----------+---+
| a| b|
+-----------+---+
| {xxx, xxx}|xxx|
|{xxx, NULL}|xxx|
+-----------+---+