Search code examples
amazon-web-serviceskubernetesjupyterhubamazon-eks

JupyterHub notebook server returning 500 error, pod stuck in "terminating" state


I have an AWS EKS cluster (kubernetes version 1.14) which runs JupyterHub application.

One of the users notebook servers is returning a 500 error

500 : Internal Server Error Redirect loop detected. Notebook has JupyterHub version unknown (likely < 0.8), but the hub expects 0.9.6. Try installing JupyterHub==0.9.6 in the user environment if you continue to have problems. You can try restarting your server from the homepage.

Only one user is experiencing this issue, others are not. When I do "kubectl get pod", this users pod shows that it is in state "terminating" (it appears to be stuck in this state).


Solution

  • I was able to fix it, but I can't say this is the right approach. (I would have preferred to diagnose the root cause)

    1. First, I tried deleting the pod kubectl delete pod <pod_name> -- it did not work
    2. Second, I tried force deleting the pod kubectl delete pod <pod_name> --grace-period=0 --force -- it worked, but it turns out this only deletes the handle, the pod resources are then orphaned on the cluster
    3. I checked the node status kubectl get node and noticed one node was stuck in NotReady state. I recycled this node -- still did not work, the user notebook server was still stuck and returning 500 err
    4. Finally, I simply deleted the user notebook server from the jupyter hub admin page. This fixed it....