I have multiple json files. I have to parse it using apache spark. It has nested key init. I have to print all columns along with nested key.
The files has nested keys also. I want to get all column name along with nested column name. How i can get it.
i tried this :
String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json";
String[] jsonFiles = jsonFilePath.split(",");
Dataset<Row> people = sparkSession.read().json(jsonFiles);
json structure in file as :
{
"Name":"Vipin Suman",
"Email":"[email protected]",
"Designation":"Programmer",
"Age":22 ,
"location":
{
"City":"Ahmedabad",
"State":"Gujarat"
}
}
i am getting result as :
people.show(50, false);
Age | Designation | Email | Name | Location
------------------------------------------------------------
22 |Programmer |[email protected] | Vipin Suman|[Ahmedabad,Gujarat]
I want data like :
Age | Designation | Email | Name | City | State
------------------------------------------------------------
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad |Gujarat
or like :-
Age | Designation | Email | Name | Location
---------------------------------------------------------------
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad,Gujarat
if scema look like this
root
|-- Age: long (nullable = true)
|-- Company: struct (nullable = true)
| |-- Company Name: string (nullable = true)
| |-- Domain: string (nullable = true)
|-- Designation: string (nullable = true)
|-- Email: string (nullable = true)
|-- Name: string (nullable = true)
|-- Test: array (nullable = true)
| |-- element: string (containsNull = true)
|-- location: struct (nullable = true)
| |-- City: struct (nullable = true)
| | |-- City Name: string (nullable = true)
| | |-- Pin: long (nullable = true)
| |-- State: string (nullable = true)
and json structure
{
"Name":"Vipin Suman",
"Email":"[email protected]",
"Designation":"Trainee Programmer",
"Age":22 ,
"location":
{"City":
{
"Pin":324009,
"City Name":"Ahmedabad"
},
"State":"Gujarat"
},
"Company":
{
"Company Name":"Elegant",
"Domain":"Java"
},
"Test":["Test1","Test2"]
}
then how i can find nested key. and show the table in proper formet
To display data in above expected format you can use following code:
people.select("*", "location.*").drop("location").show
It will give following output:
+---+-----------+-----------------+----------+---------+-------+
|Age|Designation| Email| Name| City| State|
+---+-----------+-----------------+----------+---------+-------+
| 22| Programmer|[email protected]|VipinSuman|Ahmedabad|Gujarat|
+---+-----------+-----------------+----------+---------+-------+