Search code examples
kubernetesopenshift-originopenshift-enterprise

Single Kubernetes/OpenShift cluster/instance across datacenters?


With the understanding that Ubernetes is designed to fully solve this problem, is it currently possible (not necessarily recommended) to span a single K8/OpenShift cluster across multiple internal corporate datacententers?

Additionally assuming that latency between data centers is relatively low and that infrastructure across the corporate data centers is relatively consistent.

Example: Given 3 corporate DC's, deploy 1..* masters at each datacenter (as a single cluster) and have 1..* nodes at each DC with pods/rc's/services/... being spun up across all 3 DC's.

Has someone implemented something like this as a stop gap solution before Ubernetes drops and if so, how has it worked and what would be some considerations to take into account on running like this?


Solution

  • is it currently possible (not necessarily recommended) to span a single K8/OpenShift cluster across multiple internal corporate datacententers?

    Yes, it is currently possible. Nodes are given the address of an apiserver and client credentials and then register themselves into the cluster. Nodes don't know (or care) of the apiserver is local or remote, and the apiserver allows any node to register as long as it has valid credentials regardless of where the node exists on the network.

    Additionally assuming that latency between data centers is relatively low and that infrastructure across the corporate data centers is relatively consistent.

    This is important, as many of the settings in Kubernetes assume (either implicitly or explicitly) a high bandwidth, low-latency network between the apiserver and nodes.

    Example: Given 3 corporate DC's, deploy 1..* masters at each datacenter (as a single cluster) and have 1..* nodes at each DC with pods/rc's/services/... being spun up across all 3 DC's.

    The downside of this approach is that if you have one global cluster you have one global point of failure. Even if you have replicated, HA master components, data corruption can still take your entire cluster offline. And a bad config propagated to all pods in a replication controller can take your entire service offline. A bad node image push can take all of your nodes offline. And so on. This is one of the reasons that we encourage folks to use a cluster per failure domain rather than a single global cluster.