Thanks in advance to all those who help.
Hello, I have somewhat of a unique problem, its rather lengthy to explain but I think if solved we can expand the use-cases of Kubernetes. I think I know how to solve it, but I'm not sure if Kubernetes Stateful Sets supports the solution. Let me elaborate the domain of the problem, the problem itself, and then some of my sample solutions and maybe someone can help fill the gaps.
The Domain Space:
Obviously, looking at the available Kubernetes tools/objects, a stateful-set with headless-service is the ideal way of approaching this. It supports unique pods, which are assigned unique IPs, and supports persistent volumes. It also supports dynamically provisioning persistent-volumes through
The Problem:
As mentioned in the domain, accounts can be active in any order, but stateful-set pods are ordinal, meaning pod_1 has to be active for pod_2 to be active for pod_3 to be active, etc. We can't have pod_1 active and pod_3 active while pod_2 is inactive. This means if I enable Account_A, then Account_C, a pod named pod_1 will be created, and then a pod named pod_2 will be created.
Now you might say that this isn't a problem. We just keep a map that maps each account to the relative pod_number. For example, Account_A -> pod_1 and Account_C -> pod_2
Why is this a problem? Because when specifying a volumeClaimTemplate in the stateful-set, persistent-volume-claims use the pod's name as their identifier when being created. Which means that only the pod with the same name can access the same data. The data(volumes) is bound based on a pod's name, rather than the account. This creates a disconnect between accounts and their persistent volumes. Any pod with name pod_2 will always have the same data that pod_2 has always had, regardless of which account was "mapped" to pod_2.
Let me further illustrate this with an example:
1. Account_A=disabled, Account_B=disabled, Account_C=disabled (Start state, all accs disabled)
2. Account_A=enabled, Account_B=enabled, Account_C=enabled -> (All accounts are enabled)
pod_1 is created (with volume_1) and mapped to Account_A
pod_2 is created (with volume_2) and mapped to Account_B
pod_3 is created (with volume_3) and mapped to Account_C
3. Account_A=disabled, Account_B=disabled, Account_C=disabled (All Accounts are disabled)
pod_1 is deleted, volume_1 persists
pod_2 is deleted, volume_2 persists
pod_3 is deleted, volume_3 persists
4. Account_A=enabled, Account_B=disabled, Account_C=enabled (re-enable A and C but leave B disabled)
pod_1 is created (with volume_1) and mapped to Account_A (THIS IS FINE)
pod_2 is created (with **volume_2**) and mapped to Account_C (THIS IS **NOT** FINE)
Can you see the issue? Account_C is now using the data-store that should belong to Account_B (volume_2 was created and used by account_b not Account_C), because of the fact that volumes/claims are mapped by name to pod names, and pods have to be ordinal i.e. pod_1 then pod_2.
Potential Solutions
Be able to support custom non-ordinal names for pods in a stateful-set. (Simplest and most effective)
This solves everything, and keeps the benefits and tools of statefulsets. I can name my pods what I want when launched, so that when an account is enabled I just launch a pod with that accounts name, and the volume that is created is mapped to any pod with that same name. I've looked and can't seem to find a way to do this.
(p.s.) I know that stateful-sets are supposed to be ordinal for ordering guarantees, but you can turn this off with "podManagementPolicy: Parallel"
Some way to do this with labels and selectors instead?
I'm rather new to Kubernetes, and I still don't fully understand all the moving parts. Maybe there's some way to use labels in my volumeClaimtemplate, to have volume claims attach to volumes with a certain label. i.e. Account_C mapped to pod_2 can request volume_3 because volume_3 has a label with: account=Account_C. I'm currently looking into this. If it helps, my persistent volumes are provisioned dynamically using this tool: https://github.com/kubernetes-incubator/external-storage/tree/master/nfs-client Maybe I can somehow modify it so that it adds certain labels to the persistent-volumes it creates.
Ditch statefulsets and deployments and just add pods manually to the cluster
This is not a great solution since according to docs, pods shouldn't really exist without a statefulset or deployment as a parent, and it also removes all the built-in functionality of persistent-volumes and dynamic volume provisioning, etc. For me the dealbreaker is not having volumeClaimTemplates which create or bind to an existing volumeClaim when deployed. If I could recreate this somehow, this solution would work.
Create custom Kubernetes object to do this for me
This is unideal, since it would be a lot of work and I wouldn't even know where to begin. I would also be recreating the exact same thing as a stateful-set except without the ordinal-mapping. I would have to figure out how to writeoperators and replicasets, etc. Seems like overkill for a rather simple problem.
I will update with anything else I find or think of. Thanks to all who help.
It seems to me that you're convinced that StatefulSets
is a step in the right direction but that's not entirely true.
StatefulSets
have ordinality due to two reasons:
PersistentVolumeClaims
In your case, neither seems to be true. You just need stable storage per account. While you think that #4 from your potential solutions is most unideal, it is the most "Kubernetes native" way to do it.
You need to write a component that manages a StatefulSet or even a Deployment per account. I say deployment because you don't need stable network identifiers for each pod. A ClusterIP
service per account will be adequate for communication.
In the Kubernetes world, these components are called controllers (without custom objects) and operators (with custom objects/manages applications).
You can start by looking into operator-sdk and controller-runtime. Operator SDK aggregates commonly used functionalities on top of controller-runtime
as a framework. It also makes developers' life easier by incorporating kubebuilder which is used to generate CRD and K8S API code for custom objects. All you need to define is struct
s for your custom object and a controller.
Take a look at Operator SDK, you'll find that creating and managing custom objects is not that hard.
This is how I imagine the flow of your operator from what I understood in your write up.
Account
object maps to one account. Each object has unique metadata that maps it to its account. It should also have an active: boolean
in its spec.Account
objectsWhenever you need to create a new account, use Kubernetes APIs to create a new Account
object (will trigger an Add event in the controller) and then your controller should
PersistentVolumeClaim
for the accountDeployment
with the volume from created PVC
specified in the Pod templateSet the active
field in your custom object to false
for deactivating the account (a Modify event in the controller) and then your controller should
active
field to true
for reactivating the account. (modify event again)
Account
object to clean up underlying resources.While all of this might not make perfect sense right away, I would still suggest you to go through operator-sdk's docs and examples. IMO, that would be a leap in the right direction.
Cheers!