Search code examples
node.jswebsocketauthorizationkuberneteskubectl

Authorize over WebSocket connection to remote kubernetes api-server


See the connected question - Kubernetes pod exec API exception: Response must not include 'Sec-WebSocket-Protocol' header if not present in request.

I have been able to successfully make a WebSocket connection using Pod exec API. But I am using kubectl proxy on localhost to handle the authorization on behalf of the terminal client.

The next step is to be able to authorize the request directly to the kubernetes API server, so that there's no need to route the traffic via kubectl proxy. Here's a discussion in the python community where they have been able to send Authorization token to the api-server. But I haven't had any success with this in nodejs. I must admit that I am not familiar with python as well to understand the discussion enough.

Can someone from the kubernetes team point me in the right direction?

Thanks


Solution

  • For future wanderers....

    Although the exec API supports Authorization header, the browser WebSocket API doesn't support it yet. So the solution for us was to reverse-proxy it from our server APIs.

    It went like this...

    client browser -wss-> GKE LB (SSL Termination) -ws-> site API (nodejs) -WSS & Authorization-> kube api-server exec API

    So to answer my own question, per my tests, the GKE kubernetes supports Authorization only in headers, so you need to reverse proxy if you want to connect to it via browser. Per this code, some Kubernetes setups allow tokens in the query string, but I didn't have any success with GKE. If you are using a different cluster host, YMMY. I welcome comments from kubernetes team on my observations.

    If you came here only for an authorization issue, you may stop reading further.

    There are still more challenges to overcome though, and there's good news and bad news... the good news first:

    GKE Loadbalancer automatically handles SSL termination even for WebSockets, so you can proxy to either WS or WSS without any issues.

    And then the bad news:

    GKE Loadbalancer force terminates ALL connections within 30 seconds, even if they are in use! There are workarounds, but they either don't stay put, require you to deploy your own controller, or you need to use Ingress. What this means for a Terminal sessions is that Chrome will close the client with a 1006 code, even if a command is running at that time.

    For some WS scenarios, it may be acceptable to simply reconnect on a 1006 close, but for a terminal session, this is a deal-breaker as you cannot reconnect to the previous terminal instance and must begin with a new one.

    For now we have resorted to increasing the timeout of the GKE Loadbalancer. But eventually we are planning to deploy our own Loadbalancer which can handle this better. Ingress has some issues which we don't want to live with at the moment.