I've inherited a cluster that uses knox and am trying to figure out why the Spark history server is available for completed Spark jobs but the Spark UI is not available for in-progress Spark applications.
In this yarn UI (which is exposed via Knox) there are 5 completed yarn applications and 1 in-progress yarn application. All are spark applications:
In the Tracking UI columns the available links are:
The five links pertaining to the completed jobs all successfully bring up the Spark History server UI for those jobs. If I issue cat ${GATEWAY_HOME}/logs/gateway-audit.log
I can see the following appear when I hit any of those five links:
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||access|uri|/gateway/my-cluster-name/sparkhistory/history/application_1580137635209_0001/1|unavailable|Request method: GET 20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|unavailable|Request method: GET
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|success|Response status: 30
and lots and lots of other log records for Spark History UI resources. All good. Notice the 302 record (redirect)
However, if I hit the link for the in-progress application I get sent to http://my-cluster-name-m:18080/history/application_1580137635209_0006/1 which is the cluster master node, and the following displayed:
In the logs I see:
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|success|Response status: 200
20/01/27 15:58:38 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|success|Response status: 200
Notice there are no 302 records there.
Edit: Since originally posting this I have noticed that if i click on the Tracking UI link immediately after the application starts then I am taken to the details of the yarn application:
A few seconds later clicking on the same link will take me to the error as shown above.
I'm a bit lost at this point. Can anyone help explain why I can't view the Spark UI for in-progress applications? Any pointers as to how I can diagnose would be welcomed.
OK, the answer is rather embarrassing. The cause was simply that the spark UI was not enabled. Changing setting spark.ui.enabled
to true
solved this particular problem.