Search code examples
jsonscalaapache-sparketl

Removing root element from json record in Spark


In my input json data I have root element which is wrapper for my desired data, I would like to remove it and just have target data in each record.

Input data:

{"rootElement": {"firstName": "John", "lastName": "Doe", "age": 11}}
{"rootElement": {"firstName": "Jane", "lastName": "Doe", "age": 33}}
{"rootElement": {"firstName": "Scott", "lastName": "Smith", "age": 22}}

Expected output:

{"firstName": "John", "lastName": "Doe", "age": 11}
{"firstName": "Jane", "lastName": "Doe", "age": 33}
{"firstName": "Scott", "lastName": "Smith", "age": 22}}

I tried this so far:

sparkSession.read.json(inputFileLocation).toDF().map(func => func.getObject("rootElement"))

but won't compile


Solution

  • sparkSession read json return a dataframe already, no need to do toDf()

    try this:

    val df = sparkSession.read.json("your path")
    df.select($"rootElement.*").write.json("your output path")