Search code examples
apache-sparkkuberneteskubernetes-ingressapache-zeppelin

Expose spark-ui with zeppelin on kubernetes


First of all I'm pretty new on all this (kubernetes, ingress, spark/zeppelin ...) so my apologies if this is obvious. I tried searching here, documentations etc but couldn't find anything.

I am trying to make the spark interpreter ui accessible from my zeppelin notebook running on kubernetes. Following what I understood from here: http://zeppelin.apache.org/docs/0.9.0-preview1/quickstart/kubernetes.html, my ingress yaml looks something like this:

Ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-zeppelin-server-http
http
spec:
  rules:
  - host: my-zeppelin.my-domain
    http:
      paths:
      - backend:
          serviceName: zeppelin-server
          servicePort: 8080
  - host: '*.my-zeppelin.my-domain'
    http:
      paths:
      - backend:
          serviceName: spark-guovyx
          servicePort: 4040
status:
  loadBalancer: {}

My issue here is that I need to rely on the service-name (in this case spark-guovyx) being set to the interpreter pod name in order to have the UI show up. However since this name is bound to change / have different ones (i.e. I have one interpreter per user + interpreters are frequently restarted) obviously I cannot rely on setting it manually. My initial thought was to use some kind of wildcard naming for the serviceName but turns out ingress/kubernetes don't support that. Any ideas please ?

Thanks.


Solution

  • You can create a new service and leverage the interpreterSettingName label of the spark master pod. When zeppelin creates a master spark pod it adds this label and its value is spark. I am not sure if it will work for more than one pods in a per user per interpreter setting. Below is the code for service, do let me know how it behaves for per user per interpreter.

    kind: Service
    apiVersion: v1
    metadata:
      name: sparkUI
    spec:
      ports:
        - name: spark-ui
          protocol: TCP
          port: 4040
          targetPort: 4040
      selector:
        interpreterSettingName: spark
      clusterIP: None
      type: ClusterIP
    

    And then you can have your ingress as:

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: ingress-zeppelin-server-http
    http
    spec:
      rules:
      - host: my-zeppelin.my-domain
        http:
          paths:
          - backend:
              serviceName: zeppelin-server
              servicePort: 8080
      - host: '*.my-zeppelin.my-domain'
        http:
          paths:
          - backend:
              serviceName: sparkUI
              servicePort: 4040
    status:
      loadBalancer: {}
    

    Also do checkout this repo https://github.com/cuebook/cuelake, it is still in early stage of development but would love to hear your feedback.