Search code examples
kubernetescluster-computingcockroachdbcilium

Multi cluster CockroachDB with Cilium Cluster Mesh


I am trying to enable a multi cluster CockroachDB spawning 3 k8s clusters connected with Cilium Cluster Mesh. The idea of having a multi cluster CockroachDB is described on cockroachlabs.com - 1, 2. Given the fact that the article calls for a change in CoreDNS ConfigMap, instead of using Cilium global-services feels suboptimal.

Therefore the question arises, how to enable a multi cluster CockroachDB in a Cilium Cluster Mesh environment, using Cilium global services instead of hacking CoreDNS ConfigMap ?

With CockroachDB installed via helm, it deploys a StatefulSet with a carefully crafted --join parameter. It contains FQDNs of CockroachDB pods that are to join the cluster.

The pod FQDNs come from service.discover that is created with clusterIP: None and

(...) only exists to create DNS entries for each pod in the StatefulSet such that they can resolve each other's IP addresses.

The discovery service automatically registers DNS entries for all pods within the StatefulSet, so that they can be easily referenced

Can a similar discovery service or alternative be created for a StatefulSet running on a remote cluster ? So that with cluster mesh enabled, pods J,K,L in cluster Β could be reached from pods X,Y,Z in cluster Α by their FQDN ?

As suggested in create-service-per-pod-in-statefulset, one could create services like

{{- range $i, $_ := until 3 -}}
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
    io.cilium/global-service: 'true'
    service.cilium.io/affinity: "remote"
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  name: dbs-cockroachdb-remote-{{ $i }}
  namespace: dbs
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
    statefulset.kubernetes.io/pod-name: cockroachdb-{{ $i }}
  type: ClusterIP
  clusterIP: None
  publishNotReadyAddresses: true
---
kind: Service
apiVersion: v1
metadata:
  name: dbs-cockroachdb-public-remote-{{ $i }}
  namespace: dbs
  labels:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
  annotations:
    io.cilium/global-service: 'true'
    service.cilium.io/affinity: "remote"
spec:
  ports:
  - name: grpc
    port: 26257
    protocol: TCP
    targetPort: grpc
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/component: cockroachdb
    app.kubernetes.io/instance: dbs
    app.kubernetes.io/name: cockroachdb
{{- end -}}

So that they resemble the original service.discovery and service.public

However, despite the presence of cilium annotations

io.cilium/global-service: 'true'
service.cilium.io/affinity: "remote"

services look bound to the local k8s cluster, resulting in CockroachDB consisting of 3 instead of 6 nodes. (3 in cluster A + 3 in cluster B)

hubble: no cross connect CockroachDB: CockroachDB dash

It does not matter how which service (dbs-cockroachdb-public-remote-X, or dbs-cockroachdb-remote-X) I use in my --join command overwrite

    join:
      - dbs-cockroachdb-0.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-1.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-2.dbs-cockroachdb.dbs:26257
      - dbs-cockroachdb-public-remote-0.dbs:26257
      - dbs-cockroachdb-public-remote-1.dbs:26257
      - dbs-cockroachdb-public-remote-2.dbs:26257

The result is the same, 3 nodes instead of 6.

Any ideas?


Solution

  • Apparently due to 7070, patching CoreDNS ConfigMap is the most reasonable thing we can do. In the comments of that bug, an article is mentioned, that provides additional context.

    My twist to this story is that I updated the config map with kubernetes plugin config:

    apiVersion: v1
    data:
      Corefile: |-
        saturn.local {
          log
          errors
          kubernetes saturn.local {
            endpoint https://[ENDPOINT]
            kubeconfig [PATH_TO_KUBECONFIG]
          }
        }
        rhea.local {
          ...
    

    So that I could resolve other names as well. In my setup, each cluster has its own domain.local. PATH_TO_KUBECONFIG is a plane kubeconfig file. Generic secret has to be created in kube-system namespace and the secret volume has to be mounted under coredns deployment.

    it works