Search code examples
kubernetesgarbage-collectionkubernetes-namespace

pod getting terminated because of ownerReferences pointing to resource in different namespace in kubernetes


Starting kubernetes 1.20 there has been a change regarding ownerReferences and how K8s performs GC. Basically if a resource in x namespace spins up a pod/job in a y namespace with child having ownerReferences referencing to parent resource in x, K8s terminates the child pod/job.

Reference:

  • Resolves non-deterministic behavior of the garbage collection controller when ownerReferences with incorrect data are encountered. Events with a reason of OwnerRefInvalidNamespace are recorded when namespace mismatches between child and owner objects are detected. The kubectl-check-ownerreferences tool can be run prior to upgrading to locate existing objects with invalid ownerReferences.
    • A namespaced object with an ownerReference referencing a uid of a namespaced kind which does not exist in the same namespace is now consistently treated as though that owner does not exist, and the child object is deleted.
    • A cluster-scoped object with an ownerReference referencing a uid of a namespaced kind is now consistently treated as though that owner is not resolvable, and the child object is ignored by the garbage collector. (#92743, @liggitt) [SIG API Machinery, Apps and Testing]

If we remove the ownerReferences, the resource wont be garbage collected. Is there a way to deal with this situation i.e.; how to make ownerReferences work in multiple namespaces OR let the job/pod clean itself once completed? Thanks.


Solution

  • As per Fix GC uid races and handling of conflicting ownerReferences #92743

    namespaces are intended to be independent of each other, so cross-namespace references have not been permitted in things like ownerReferences, secret/configmap volume references, etc.

    additionally, granting permissions to namespace a is not generally intended to provide visibility or ability to interact with objects from namespace b (or cause system controllers to interact with objects from namespace b).

    and Update GC cross-namespace note #25091

    Cross-namespace owner references are disallowed by design.

    So, using ownerReferences for garbage collection across namespaces is not possible by desing.


    However, you can emulate multi-namespace GC using labels. You just need to configure those labels when some object creates sub-object.

    Alternatively you can delete a namespace to GC all object in that namespace, but that's probably suboptimal solution.


    EDIT

    $ kubectl label pods owner=my -l region=europe
    $ kubectl label pods owner=my -l region=pacific
    
    $ kubectl label svc owner=my -l svc=europe
    $ kubectl label svc owner=my -l svc=pacific
    
    $ kubectl label pod kube-proxy-2wpz2 owner=my -n kube-system 
    $ kubectl label pod kube-proxy-cpqxt  owner=my -n kube-system 
    
    $ kubectl get pods,svc -l owner=my --show-labels --all-namespaces
    
    NAMESPACE     NAME                   READY   STATUS        RESTARTS   AGE    LABELS
    default       pod/aloha-pod          1/1     Running       0          54d    app=aloha,owner=my,region=pacific
    default       pod/ciao-pod           1/1     Running       0          54d    app=ciao,owner=my,region=europe
    default       pod/hello-pod          1/1     Terminating   0          54d    app=hello,owner=my,region=europe
    default       pod/ohayo-pod          1/1     Running       0          54d    app=ohayo,owner=my,region=pacific
    kube-system   pod/kube-proxy-2wpz2   1/1     Running       2          299d   controller-revision-hash=5cf956ffcf,k8s-app=kube-proxy,owner=my,pod-template-generation=1
    kube-system   pod/kube-proxy-cpqxt   1/1     Running       3          299d   controller-revision-hash=5cf956ffcf,k8s-app=kube-proxy,owner=my,pod-template-generation=1
    
    NAMESPACE   NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE   LABELS
    default     service/europe    ClusterIP   10.109.5.102    <none>        80/TCP    54d   owner=my,svc=europe
    default     service/pacific   ClusterIP   10.99.255.196   <none>        80/TCP    54d   owner=my,svc=pacific
    
    $ kubectl delete pod,svc -l owner=my --dry-run --all-namespaces
    
    pod "aloha-pod" deleted (dry run)
    pod "ciao-pod" deleted (dry run)
    pod "hello-pod" deleted (dry run)
    pod "ohayo-pod" deleted (dry run)
    pod "kube-proxy-2wpz2" deleted (dry run)
    pod "kube-proxy-cpqxt" deleted (dry run)
    service "europe" deleted (dry run)
    service "pacific" deleted (dry run)
    

    Alternatively there could be a bash script that deletes all objects whose controller object doesn't exist, based on labels. It could also run inside the cluster with proper service account configured.


    There is no straightforward, built-in option to achieve what you want. You should keep owner referenced objects in the same namespace.