Search code examples
apache-sparkapache-spark-mllib

How to pass additional dataframes to custom Spark MLLib Transformers


I am writing a custom Spark transformer in which I need to access an additional dataframe and do a join with the main dataset. The path of the dataframe to be joined will be present in my main class. How can I pass the dataframe itself or the path to the dataframe to the custom transformer?


Solution

  • As suggested by @SomeshwarKale, it can be accessed within the transform method. The sparkSession needed can be obtained by fetching it from dataset.sparkSession