Search code examples
apache-sparkpysparkspline-data-lineage-tracker

Error enabling lineage in spark using spline?


I tried using spline to track the lineage in spark using both ways specified here But both of them failed with same error

ERROR QueryExecutionEventHandlerFactory: Spline Initialization Failed! Spark lineage tracking is disabled Spark Agent was not able to establish connection with spline gateway

CausedBy: java.net.connectException: Connection Refused

I am able to see the UI at port 8080, 9090 and also arangoDB is up and running.

But no lineage is displayed.

I have tried pyspark as well as spark-shell but no luck. Any help is appreciated.


Solution

  • I was able to resolve the issue by manually creating the rest-server, arangoDb and web-client and then providing the correct uri for producer while running spark shell

    --conf "spark.spline.producer.url=http://localhost:8080/producer"
    

    Still I was not getting the lineage on the webui despite applying various actions and transformations.

    Later I realized the Lineage is generated once we save the dataframe, so as soon a write was triggered I was able to see the lineage graph.