kubernetes kubernetes-operator operator-sdk

What is the use for CRD status?

I'm currently writing a kubernetes operator in go using the operator-sdk. This operator creates two StatefulSet and two Service, with some business logic around it.

I'm wondering what CRD status is about ? In my reconcile method I use the default client (i.e. r.List(ctx, &setList, opts...)) to fetch data from the cluster, shall I instead store data in the status to use it later ? If so how reliable this status is ? I mean is it persisted ? If the control plane dies is it still available ? What about disaster recovery, what if the persisted data disappear ? Doesn't that case invalidate the use of the CRD status ?

Solution

The status subresource of a CRD can be considered to have the same objective of non-custom resources. While the spec defines the desired state of that particular resource, basically I declare what I want, the status instead explains the current situation of the resource I declared on the cluster and should help in understanding what is different between the desired state and the actual state.

Like a StatefulSet spec could say I want 3 replicas and its status say that right now only 1 of those replicas is ready and the next one is still starting, a custom resource status may tell me what is the current situation of whatever I declared in the specs.

For example, using the Rook Operator, I could declare I want a CephCluster made in a certain way. Since a CephCluster is a pretty complex thing (made of several StatefulSets, Daemons and more), the status of the custom resource definition will tell me the current situation of the whole ceph cluster, if it's health is ok or if something requires my attention and so on.

From my understandings of the Kubernetes API, you shouldn't rely on status subresource to decide what your operator should do regarding a CRD as it is way better and less prone to errors to always check the current situation of the cluster (at operator start or when a resource is defined, updated or deleted)

Last, let me quote from Kubernetes API conventions as it exaplins the convention pretty well ( https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status )

By convention, the Kubernetes API makes a distinction between the specification of the desired state of an object (a nested object field called "spec") and the status of the object at the current time (a nested object field called "status").

The specification is a complete description of the desired state, including configuration settings provided by the user, default values expanded by the system, and properties initialized or otherwise changed after creation by other ecosystem components (e.g., schedulers, auto-scalers), and is persisted in stable storage with the API object. If the specification is deleted, the object will be purged from the system.

The status summarizes the current state of the object in the system, and is usually persisted with the object by automated processes but may be generated on the fly. At some cost and perhaps some temporary degradation in behavior, the status could be reconstructed by observation if it were lost.

When a new version of an object is POSTed or PUT, the "spec" is updated and available immediately. Over time the system will work to bring the "status" into line with the "spec". The system will drive toward the most recent "spec" regardless of previous versions of that stanza. In other words, if a value is changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system is not required to 'touch base' at 5 before changing the "status" to 3. In other words, the system's behavior is level-based rather than edge-based. This enables robust behavior in the presence of missed intermediate state changes.