Search code examples
pysparkazure-data-lakeazure-data-factoryegg

How to run python egg (present in azure databricks) from Azure data factory?


So I created a small pyspark application and converted it to an egg. Uploaded it to dbfs:/FileStore/jar/xyz.egg. In ADF I used jar activity. But in Main Class Name textbox i am confused what to provide.enter image description here

My Pycharm application has three files, two of them are basically Utility files that contains utility functions that I call and the content of main file is:

Main.py

from CommonUtils import appendZeros
from sampleProgram import writedf


def main():
    appendZeros('zzz')
    writedf()


if __name__ == "__main__":
    main()

Now what to specify in 'Main class name' textbox?


Solution

  • Note: Main Class Name is "The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library."

    The following table describes the JSON properties used in the JSON definition:

    enter image description here

    Reference: "Transform data by running a Jar activity in Azure Databricks".

    Hope this helps.


    If this answers your query, do click “Mark as Answer” and "Up-Vote" for the same. And, if you have any further query do let us know.