Search code examples
kubernetesgoogle-cloud-composerkubernetes-python-client

Why shouldn't you run Kubernetes pods for longer than an hour from Composer?


The Cloud Composer documentation explicitly states that:

Due to an issue with the Kubernetes Python client library, your Kubernetes pods should be designed to take no more than an hour to run.

However, it doesn't provide any more context than that, and I can't find a definitively relevant issue on the Kubernetes Python client project.

To test it, I ran a pod for two hours and saw no problems. What issue creates this restriction, and how does it manifest?


Solution

  • I'm not deeply familiar with either the Cloud Composer or Kubernetes Python client library ecosystems, but sorting the GitHub issue tracker by most comments shows this open item near the top of the list: https://github.com/kubernetes-client/python/issues/492

    It sounds like there is a token expiration issue:

    @yliaog this is an issue for us, as we are running kubernetes pods as batch processes and tracking the state of the pods with a static client. Once the client object is initialized, it does no refresh, and therefore any job that takes longer than 60 minutes will fail. Looking through python-base, it seems like we could make a wrapper class that generates a new client (or refreshes the config) every n minutes, or checks status prior to every call (as @mvle suggested). The best fix would be in swagger-codegen, but a temporary solution would probably be very useful for a lot of people.

    - @flylo, https://github.com/kubernetes-client/python/issues/492#issuecomment-376581140