Search code examples
apache-sparkpalantir-foundryfoundry-code-repositories

How to get the Hadoop path with Java/Scala API in Code Repositories


My need is to read other formats: JSON, binary, XML and infer the schema dynamically within a transform in Code Repositories and using Spark datasource api.

Example:

val df = spark.read.json(<hadoop_path>)

For that, I need an accessor to the Foundry file system path, which is something like:

foundry://...@url:port/datasets/ri.foundry.main.dataset.../views/ri.foundry.main.transaction.../startTransactionRid/ri.foundry.main.transaction...

This is possible with PySpark API (Python):

filesystem = input_transform.filesystem()
hadoop_path = filesystem.hadoop_path

However, for Java/Scala I didn’t find a way to do it properly.


Solution

  • The getter to the Hadoop path has been recently added to Foundry Java API. By upgrading the version of the java transform (transformsJavaVersion >= 1.188.0), and you can get it:

    val hadoopPath = myInput.asFiles().getFileSystem().hadoopPath()