Search code examples
jupyterhub

Jupyterhub on Kubernetes + Nginx - No spawning after login


Context: Using Terraform I created an EKS cluster on AWS. On that cluster I installed Nginx Ingress using Helm 3. TLS is performed using Let's Encrypt with cert-manager. Subsequently I can add web exposed applications using deployment, services and ingress yaml files.

Problem: Something that does not work for me is deploying JupyterHub successfully. Installation and exposure work fine, with JupyterHub using the TCP protocol and cert-manager creating the certificates successfully. The problem starts when a user logs in successfully into jupyterhub but a invalid or expired cookie token occurs when jupyterhub is supposed to spawn a notebook.

Question: It is unclear to me why the spawning does not work and how this can be resolved. Does anyone have a suggestion to better understand the issue?

The jupyterhub_config.py is as follows:

c = get_config()
c.JupyterHub.authenticator_class = 'jupyterhub.auth.DummyAuthenticator'
c.Authenticator.allowed_users = {'dummy'}
c.Authenticator.admin_users = {'dummy'}
c.DummyAuthenticator.password = "fakenews"
c.JupyterHub.admin_access = True

The deployment.yaml is as follows:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  generation: 1
  labels:
    run: jupyterhub
  name: jupyterhub
  namespace: jhub
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      run: jupyterhub
  template:
    metadata:
      creationTimestamp: ~
      labels:
        run: jupyterhub
    spec:
      containers:
        - name: jupyterhub
          image: "jupyterhub/jupyterhub:latest"
          imagePullPolicy: IfNotPresent
          ports:
            -
              containerPort: 8000
              protocol: TCP
          terminationMessagePolicy: File
          volumeMounts:
            -
              mountPath: /srv/jupyterhub/jupyterhub_config.py
              name: jupyterhub-config
              subPath: jupyterhub_config.py
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
        -
          configMap:
            name: jupyterhub-config
          name: jupyterhub-config

The ingress.yaml is as follows:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-resource
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  tls:
  - hosts:
    - hub.example.com
    secretName: hub-example-com-tls
  rules:
  - host: hub.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: jupyterhub
          servicePort: 8000

The commands used:

$ kubectl create configmap jupyterhub-config --from-file=./jupyterhub_config.py
$ kubectl create -f deployment.yaml
$ kubectl expose deployment jupyterhub
$ kubectl apply -f ingress.yaml

This results in a successful secure deployment web service on https://hub.example.com. But after logging in, the jupyterhub container log gives an invalid or expired cookie token when trying to spawn a jupyter instance.

[I 2020-08-21 08:26:42.725 JupyterHub app:2307] Running JupyterHub version 1.2.0dev
[I 2020-08-21 08:26:42.726 JupyterHub app:2338] Using Authenticator: jupyterhub.auth.DummyAuthenticator-1.2.0dev
[I 2020-08-21 08:26:42.726 JupyterHub app:2338] Using Spawner: jupyterhub.spawner.LocalProcessSpawner-1.2.0dev
[I 2020-08-21 08:26:42.726 JupyterHub app:2338] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-1.2.0dev
[I 2020-08-21 08:26:42.735 JupyterHub app:1442] Writing cookie_secret to /srv/jupyterhub/jupyterhub_cookie_secret
[I 2020-08-21 08:26:42.752 alembic.runtime.migration migration:155] Context impl SQLiteImpl.
[I 2020-08-21 08:26:42.752 alembic.runtime.migration migration:162] Will assume non-transactional DDL.
[I 2020-08-21 08:26:42.758 alembic.runtime.migration migration:515] Running stamp_revision  -> 4dc2d5a8c53c
[I 2020-08-21 08:26:42.809 JupyterHub proxy:461] Generating new CONFIGPROXY_AUTH_TOKEN
[I 2020-08-21 08:26:42.850 JupyterHub app:2377] Initialized 0 spawners in 0.002 seconds
[W 2020-08-21 08:26:42.853 JupyterHub proxy:643] Running JupyterHub without SSL.  I hope there is SSL termination happening somewhere else...
[I 2020-08-21 08:26:42.853 JupyterHub proxy:646] Starting proxy @ http://:8000
08:26:43.359 [ConfigProxy] info: Proxying http://*:8000 to (no default)
08:26:43.362 [ConfigProxy] info: Proxy API at http://127.0.0.1:8001/api/routes
08:26:43.474 [ConfigProxy] info: 200 GET /api/routes 
[I 2020-08-21 08:26:43.475 JupyterHub app:2622] Hub API listening on http://127.0.0.1:8081/hub/
08:26:43.476 [ConfigProxy] info: 200 GET /api/routes 
[I 2020-08-21 08:26:43.476 JupyterHub proxy:320] Checking routes
[I 2020-08-21 08:26:43.476 JupyterHub proxy:400] Adding default route for Hub: / => http://127.0.0.1:8081
08:26:43.478 [ConfigProxy] info: Adding route / -> http://127.0.0.1:8081
08:26:43.478 [ConfigProxy] info: Route added / -> http://127.0.0.1:8081
08:26:43.478 [ConfigProxy] info: 201 POST /api/routes/ 
[I 2020-08-21 08:26:43.479 JupyterHub app:2697] JupyterHub is now running at http://:8000
[I 2020-08-21 08:26:56.023 JupyterHub log:181] 302 GET /hub/ -> /hub/login (@10.0.1.148) 1.16ms
[I 2020-08-21 08:27:01.409 JupyterHub base:742] User logged in: dummy
[I 2020-08-21 08:27:01.429 JupyterHub log:181] 302 POST /hub/login?next= -> /hub/spawn (dummy@10.0.1.148) 68.74ms
[I 2020-08-21 08:27:01.758 JupyterHub log:181] 200 GET /hub/login?next=%2Fhub%2Fspawn (@10.0.1.148) 219.05ms
08:31:43.482 [ConfigProxy] info: 200 GET /api/routes 
[I 2020-08-21 08:31:43.482 JupyterHub proxy:320] Checking routes
[I 2020-08-21 12:06:43.482 JupyterHub proxy:320] Checking routes
[I 2020-08-21 12:07:08.386 JupyterHub log:181] 200 GET /hub/login?next=%2Fhub%2Fspawn (@10.0.2.117) 1.85ms
[I 2020-08-21 12:07:13.216 JupyterHub base:742] User logged in: dummy
[I 2020-08-21 12:07:13.217 JupyterHub log:181] 302 POST /hub/login?next=%2Fhub%2Fspawn -> /hub/spawn (dummy@10.0.2.117) 5.40ms
[I 2020-08-21 12:07:13.309 JupyterHub log:181] 200 GET /hub/login?next=%2Fhub%2Fspawn (@10.0.2.117) 1.22ms
[I 2020-08-21 13:27:28.324 JupyterHub log:181] 302 GET / -> /hub/ (@10.0.2.117) 0.90ms 
[I 2020-08-21 13:27:28.410 JupyterHub log:181] 200 GET /hub/login (@10.0.2.117) 1.28ms 
[W 2020-08-21 13:27:34.613 JupyterHub base:392] Invalid or expired cookie token 
[I 2020-08-21 13:27:34.615 JupyterHub log:181] 302 GET /hub/spawn -> /hub/login?next=%2Fhub%2Fspawn (@10.0.2.117) 1.88ms

Solution

  • As OP menitoned, scaling the deploymenty down to 1 replica solved the problem.

    I would like to clarify what seems to be the issue.

    Jupyterhub is not scalable. It's stateful application and it's (as of now) impossible to make it run Highly Available.

    K8s Service is LoadBalancing between two pods/replicas, sending traffic randomly.

    When logged in to one jupyterhub you receive a token. Now with this token you send another request. Can you guess what would happen if this request with a token you just received gets sent to the second instance of jupyterhub. The one that has no idea what this token is because its not the one the generated it.

    invalid or expired cookie token
    

    This is what you would see. The second instance would find this token invalid.

    This is why scaling down to one replica solved the problem. Its because all traffic is now sent to one pod.