Search code examples
amazon-web-serviceskubernetesamazon-iamamazon-elbkubectl

NodePort services not working as expected on AWS, possibly due to IAM ELB permissions


Problem

I'm trying to create a service of type NodePort in my Kubernetes cluster, but it's not working as expected, and I suspect it has to do the fact that I've disabled ELB permissions for the IAM roles being used on my master node. I wouldn't think ELB permissions should matter for NodePort, but I'm seeing an error message that leads me to think this. Am I doing something wrong? Is this a known issue others have seen before?

Attempt

Deployed a service of type NodePort to my cluster, expecting to be able to reach my service on any of the nodes' public IPs and the given port, but I can't. There's 1 master and 2 non-master nodes, and no process is even bound to port 30095 (the assigned NodePort) except on the master node. SSH'ing onto the master and curling that port in a variety of ways does nothing (curl just hangs). Curling the endpoints associated with the service works fine. kubectl describe on the service suggests there was some error creating a load balancer, but I don't know why it would be doing that.

I'll reiterate that I specifically disabled the IAM role used by the master nodes from being able to do any ELB things. I don't want developers using my Kubernetes cluster to be able to spin up ELB's in my account, or do anything for that matter that would create AWS resources in my account.

Actual Result

  • information about service (commands run from local workstation) -- note CreatingLoadBalancerFailed error in output of kubectl describe service:

    $ kubectl get services frontend -oyaml
    apiVersion: v1
    kind: Service
    ---SNIP---
      ports:
      - nodePort: 30095
        port: 80
        protocol: TCP
        targetPort: 80
      selector:
        app: guestbook
        tier: frontend
      sessionAffinity: None
      type: NodePort
    status:
      loadBalancer: {}
    
    $ kubectl describe services frontend
    Name:           frontend
    Namespace:      default
    Labels:         app=guestbook
                tier=frontend
    Selector:       app=guestbook,tier=frontend
    Type:           NodePort
    IP:         100.67.10.125
    Port:           <unset> 80/TCP
    NodePort:       <unset> 30095/TCP
    Endpoints:      100.96.1.2:80,100.96.2.2:80,100.96.2.4:80
    Session Affinity:   None
    Events:
      FirstSeen LastSeen    Count   From            SubObjectPath   Type            Reason              Message
      --------- --------    -----   ----            -------------   --------    ------                  -------
      1h        4m      15  {service-controller }           Warning         CreatingLoadBalancerFailed  (events with common reason combined)
    
  • looking for processes bound to port on non-master node:

    $ netstat -tulpn | grep 30095
    # no output
    
  • looking for processes bound to port on master node:

    $ netstat -tulpn | grep 30095
    tcp6       0      0 :::30095                :::*                    LISTEN      1540/kube-proxy
    
  • attempting to curl the service (just hangs):

    $ curl localhost:30095
    # just hangs
    ^C
    
    $ curl -g -6 http://[::1]:30095
    # just hangs
    ^C
    
    $ curl -vvvg -6 http://[::1]:30095
    * Rebuilt URL to: http://[::1]:30095/
    * Hostname was NOT found in DNS cache
    *   Trying ::1...
    * Connected to ::1 (::1) port 30095 (#0)
    > GET / HTTP/1.1
    > User-Agent: curl/7.38.0
    > Host: [::1]:30095
    > Accept: */*
    >
    # just hangs after that
    ^C
    
    $ curl 100.67.10.125:30095
    # just hangs
    ^C
    
  • curling an endpoint from master node (works, so the pods are running fine):

    $ curl 100.96.2.4
    <html ng-app="redis">
      <head>
    ---SNIP---
      </body>
    </html>
    

Expected Result

Expected to see the same result from curling the endpoints when curling the external IP of any of the nodes on the service's assigned NodePort of 30095.

Additional details:

  • $ kubectl version

    Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+82450d0", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"not a git tree", BuildDate:"2016-12-14T04:09:31Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
    Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"clean", BuildDate:"2016-11-12T05:16:27Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
    
  • GitHub issue: https://github.com/kubernetes/kubernetes/issues/39214

  • Mailing list post: https://groups.google.com/forum/#!topic/kubernetes-dev/JNC_bk1L3iI

Solution

  • Kubernetes does this because it assumes that a new NodePort service may have previously been a LoadBalancer service, and so it may need to clean up the cloud load balancer. A PR was opened that would fix this issue, but then closed. In the mean time, switching IAM policy for the master role to have elasticloadbalancing:DescribeLoadBalancers instead of elasticloadbalancing:* solved the issue, i.e. the rest of the cluster including NodePort services work fine, but still restricts people from creating ELBs.