Search code examples
knox-gateway

Spark History UI is available but tracking UI errors for in-progress applications


I've inherited a cluster that uses knox and am trying to figure out why the Spark history server is available for completed Spark jobs but the Spark UI is not available for in-progress Spark applications.

In this yarn UI (which is exposed via Knox) there are 5 completed yarn applications and 1 in-progress yarn application. All are spark applications: yarn UI exposed via knox

In the Tracking UI columns the available links are:

The five links pertaining to the completed jobs all successfully bring up the Spark History server UI for those jobs. If I issue cat ${GATEWAY_HOME}/logs/gateway-audit.log I can see the following appear when I hit any of those five links:

20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|success|Response status: 302

20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||access|uri|/gateway/my-cluster-name/sparkhistory/history/application_1580137635209_0001/1|unavailable|Request method: GET 20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|unavailable|Request method: GET
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|success|Response status: 30

and lots and lots of other log records for Spark History UI resources. All good. Notice the 302 record (redirect)

However, if I hit the link for the in-progress application I get sent to http://my-cluster-name-m:18080/history/application_1580137635209_0006/1 which is the cluster master node, and the following displayed: enter image description here

In the logs I see:

20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|success|Response status: 200
20/01/27 15:58:38 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|success|Response status: 200

Notice there are no 302 records there.

Edit: Since originally posting this I have noticed that if i click on the Tracking UI link immediately after the application starts then I am taken to the details of the yarn application:

yarn app details

A few seconds later clicking on the same link will take me to the error as shown above.

I'm a bit lost at this point. Can anyone help explain why I can't view the Spark UI for in-progress applications? Any pointers as to how I can diagnose would be welcomed.


Solution

  • OK, the answer is rather embarrassing. The cause was simply that the spark UI was not enabled. Changing setting spark.ui.enabled to true solved this particular problem.