Is there any way to parse a multi-line json file using Dataset here is sample code
public static void main(String[] args) {
// creating spark session
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example")
.config("spark.some.config.option", "some-value").getOrCreate();
Dataset<Row> df = spark.read().json("D:/sparktestio/input.json");
df.show();
}
it works perfectly if json is in a single line,but i need it for multi line
My json file
{
"name": "superman",
"age": "unknown",
"height": "6.2",
"weight": "flexible"
}
Last time I checked Spark SQL docs, this stood out:
Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.
I've been able to address this in the past by loading the JSON using the Spark Context wholeTextFiles
method which produces a PairRDD.
See complete example in the "Spark SQL JSON Example Tutorial Part 2" section on this page https://www.supergloo.com/fieldnotes/spark-sql-json-examples/