How to read several JSON files with different column count into one Dataframe in Spark

I have 2 JSON files, one is like:

{
  "a":{
    "a1":"xxx"
  },
  "b":"xxx"
}

Another one is like:

{
  "a":{
    "a1":"xxx",
    "a2":"xxx"
  },
  "b":"xxx"
}

And I want to read these two JSON files into one Dataframe in Spark. I tried to use union and unionByName but they didn't work. How can I achieve this?

Solution

Spark can take care of merging the schema. See the following code:

>>> spark.read.option("multiLine", True).json("test-jsons/*").printSchema()
root
 |-- a: struct (nullable = true)
 |    |-- a1: string (nullable = true)
 |    |-- a2: string (nullable = true)
 |-- b: string (nullable = true)

>>> spark.read.option("multiLine", True).json("test-jsons/*").show()
+-----------+---+
|          a|  b|
+-----------+---+
| {xxx, xxx}|xxx|
|{xxx, NULL}|xxx|
+-----------+---+

Need jolt spec for dynamic colum
Jolt spec to lookup keys of an array
Helm convert data for nginx template
jqGrid not rendering data - Headers visible
Performance parsing JSON on Android emulator vs physical device
jsonpath in bigquery doesn't support @ for filter. Suggestions for alternatives?
Create product in printful using API - Python
Set string variable to dynamic list column value gives error: Input parameter of operation Set Variable contains invalid expression. PowerAutomate
How to examine api without documentation?
add JSON objects to specific index - Python
What is the difference between a .JSON file and .JL file?
How to get the POST values from serializeArray?
How to parse JSON data with jQuery / JavaScript?
How to Transform record using Json Jolt to expand lists while preserving index
Postman Json Schema validation failed
How to read the request body using orjson library in FastAPI?
How to return data in JSON format using FastAPI?
.NET return JSON without reformating
How do I turn a C# object into a JSON string in .NET?
.NET Core: Remove null fields from API JSON response
Is there a quick way to convert a JavaScript object to valid JSON in the text editor?
NumPy array is not JSON serializable
JSON Response is being truncated
How do I write JSON data to a file?
How to extract multiple JSON objects from one file?
How can I make sure Powershell will correctly render all non-latin characters into the output?
integrating the objects of one array into a second array with JOLT
parsing a JSON containing several stringified jsons using Batch and JQ
How to read the json file in spark using scala?
How to compare JSON documents and return the differences with Jackson or Gson?