I am working on spark in Databricks. I have a mount point for my storage location pointing to my directory. Let's call the directory as "/mnt/abc1/abc2" - path. In this "abc2" directory, lets say I have 10 folders named as "xyz1" .. "xyz10". All these "xyz%" folders contain json files, lets call them "xyz1_1.json", so on. I need to build a table such that I can access my json into spark table by referring it as path + "abc2.xyz1.xyz1_1.json"
var path = "/mnt/abc1/"
var data = spark.read.json(path)
This works when the json files are directly underlying inside the path and not inside the folders in our path. I want to figure out a way to which can automatically detect the underlying folders and the sub-folders containing the jsons, and build the table on top of it.
With spark 3+ you may add the option recursiveFileLookup
as true to search sub directories
var path = "/mnt/abc1/"
var data = spark.read.option("recursiveFileLookup","true").json(path)