Search code examples
javaapache-sparkapache-spark-sqlbusiness-intelligence

how to get nested column from json file using apache spark in java


I have multiple json files. I have to parse it using apache spark. It has nested key init. I have to print all columns along with nested key.

The files has nested keys also. I want to get all column name along with nested column name. How i can get it.

i tried this :

String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json";

String[] jsonFiles = jsonFilePath.split(",");

Dataset<Row> people = sparkSession.read().json(jsonFiles);

json structure in file as :

{ 
   "Name":"Vipin Suman",
   "Email":"[email protected]",
   "Designation":"Programmer",
   "Age":22 ,
   "location":
             {
             "City":"Ahmedabad",
             "State":"Gujarat"
             }
}

i am getting result as :

people.show(50, false);

Age | Designation | Email            | Name       | Location
------------------------------------------------------------
22  |Programmer   |[email protected] | Vipin Suman|[Ahmedabad,Gujarat]

I want data like :

Age | Designation | Email            | Name       | City      | State
------------------------------------------------------------
22  |Programmer   |[email protected] | Vipin Suman| Ahmedabad |Gujarat

or like :-

Age | Designation | Email            | Name       | Location
---------------------------------------------------------------
22  |Programmer   |[email protected] | Vipin Suman| Ahmedabad,Gujarat

if scema look like this

root
 |-- Age: long (nullable = true)
 |-- Company: struct (nullable = true)
 |    |-- Company Name: string (nullable = true)
 |    |-- Domain: string (nullable = true)
 |-- Designation: string (nullable = true)
 |-- Email: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Test: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- location: struct (nullable = true)
 |    |-- City: struct (nullable = true)
 |    |    |-- City Name: string (nullable = true)
 |    |    |-- Pin: long (nullable = true)
 |    |-- State: string (nullable = true)  

and json structure

{ 
  "Name":"Vipin Suman",
  "Email":"[email protected]",
 "Designation":"Trainee Programmer",
 "Age":22 ,
 "location":
    {"City":
           {
            "Pin":324009,
            "City Name":"Ahmedabad"
           },
    "State":"Gujarat"
   },
 "Company":
          {
           "Company Name":"Elegant",
           "Domain":"Java"
          }, 
 "Test":["Test1","Test2"]

}

then how i can find nested key. and show the table in proper formet


Solution

  • To display data in above expected format you can use following code:

    people.select("*", "location.*").drop("location").show
    

    It will give following output:

    +---+-----------+-----------------+----------+---------+-------+
    |Age|Designation|            Email|      Name|     City|  State|
    +---+-----------+-----------------+----------+---------+-------+
    | 22| Programmer|[email protected]|VipinSuman|Ahmedabad|Gujarat|
    +---+-----------+-----------------+----------+---------+-------+