I tried using spline to track the lineage in spark using both ways specified here But both of them failed with same error
ERROR QueryExecutionEventHandlerFactory: Spline Initialization Failed! Spark lineage tracking is disabled Spark Agent was not able to establish connection with spline gateway
CausedBy: java.net.connectException: Connection Refused
I am able to see the UI at port 8080
, 9090
and also arangoDB is up and running.
But no lineage is displayed.
I have tried pyspark as well as spark-shell but no luck. Any help is appreciated.
I was able to resolve the issue by manually creating the rest-server
, arangoDb
and web-client
and then providing the correct uri for producer while running spark shell
--conf "spark.spline.producer.url=http://localhost:8080/producer"
Still I was not getting the lineage on the webui despite applying various actions and transformations.
Later I realized the Lineage is generated once we save the dataframe, so as soon a write was triggered I was able to see the lineage graph.