I've been combing through documentation trying to find out if we are able to implement a specific EKS architecture for our use-case, and have not found a clear answer on how to do it or if it's possible.
Scope
We have several small pods that run 24/7, monitoring for new task requests. When a new task is detected, these monitor pods spin up worker pods, which are much heavier on CPU and Memory requirements.
What we would like to accomplish on EKS is to have 2 (or more) instance types within a single AutoScalingGroup:
Small, cheap instance to run the 24/7 pods
Large, expensive instances to run the tasks and then get terminated.
According to the documentation, having multiple instance types in an ASG is no problem, but it doesn't specify how to ensure that the small pods get assigned to the small instance, and the large pods to the large instance.
Testing done so far:
(Max = 3, Min = 1, Desired = 1)
We currently have the large instance type as priority 1, and the small one as priority 2. So at launch, one large instance gets started.
When I start my small pod with node-selector set to a small instance type, it remains in "pending" state because of the error event: 0/1 nodes are available: 1 node(s) didn't match node selector.
So currently, my question is: How do I make my ASG start a node with a specific instance-type if it is not already running, based on the pod's requirement for that specific instance-type? Any pointers to links, documentation, or suggestions for better approaches are appreciated, thank you!
This appears to be a lack of understanding on my part of how the auto-scaler and ASGs work. Based on feedback from someone in a different forum, I learned that
A) auto-scaler runs as a pod on the cluster itself (hence why the out-of-the-box EKS does not support a minimum of 0 nodes; at least one node is required to run the kube-system/auto-scaler pods).
and B) the single auto-scaler pod is able to scale the multiple ASGs that exist on the cluster. So this allows us to separate our instances into separate ASGs by cost, and ensure that the expensive instances are only used when requested by the worker pods.
Our solution so far is this:
Set up at least 2 ASGs:
Apply identifying labels to the ASGs. The EKS recommended approach (especially if you want to use Spot instances) is to use the instance size (e.g. micro, large, 4xlarge). This lets you easily add instances with the same resource sizes to an existing ASG for more reliability. Example:
Labels: asgsize=xlarge
Apply the node-selector in the pod yaml to match the desired node:
spec:
nodeSelector:
asgsize: xlarge
Set the 24/7, small-instance ASG to min=1, desired=1, max=1 (at least; max can be bigger if that fits your needs)
Set the burstable, large-instance ASG to min=0, desired=0, max=(whatever is required for your environment)
When we implemented this approach, we were able to successfully have a small instance running 24/7, and have the larger instances burst up from 0 only when a pod with that label is created.
Disclaimer:
We also ran into this little bug on our auto-scaler where the large ASG was not scaling up from 0 initially: https://github.com/kubernetes/autoscaler/issues/2418
The workaround solution in that issue worked for us. We forced our large ASG to have a min=1. Then we started a pod on that group, set the min=0 again, and deleted the pod. The instance auto-scaled down and got terminated, and then the next time we requested the pod, it auto-scaled up correctly.