My need is to read other formats: JSON, binary, XML and infer the schema dynamically within a transform in Code Repositories and using Spark datasource api.
Example:
val df = spark.read.json(<hadoop_path>)
For that, I need an accessor to the Foundry file system path, which is something like:
foundry://...@url:port/datasets/ri.foundry.main.dataset.../views/ri.foundry.main.transaction.../startTransactionRid/ri.foundry.main.transaction...
This is possible with PySpark API (Python):
filesystem = input_transform.filesystem()
hadoop_path = filesystem.hadoop_path
However, for Java/Scala I didn’t find a way to do it properly.
The getter to the Hadoop path has been recently added to Foundry Java API. By upgrading the version of the java transform (transformsJavaVersion >= 1.188.0), and you can get it:
val hadoopPath = myInput.asFiles().getFileSystem().hadoopPath()