High Available Hashicorp Vault Cluster Installation on VMWare

I want to set up a High Available Hashicorp Vault Cluster on our servers hosted on VMware.

When I followed Hashicorp's documentation, I found that the most convenient and simple is to use the Raft Storage Backend for a HA Vault Cluster. https://developer.hashicorp.com/vault/tutorials/raft/raft-storage

When I tried to install with this document, I actually saw that this installation was structured on a single server as if it were running 4 Vault Nodes. However, what I need is to set up an HA cluster on different servers. When I went through the documents, I was thoroughly confused and unsure. Is it possible to install HA Cluster in a multi-server way with OSS Vault? Or does it require an enterprise license?

I really need a very simple HA cluster. I will use it to keep Kubernetes secrets in a production environment. I am open to all your suggestions and information.

Solution

Yes, what you want to do is supported. Keep in mind that with the open source solution, only the primary node will process requests. Other nodes will forward requests they receive to the primary node. OSS Vault also limits your auto-unseal options.

This is a complete project, but here are the basic steps to get it running. It can scale to any number of nodes and it also works in containers.

For every node you have:

Configure Vault to run as a service on your virtual machine.
Make sure each node can reach its peers on their cluster_addr and that they can reach the load-balancer.
All nodes should be configured the same, with Raft storage and the same seal configuration.
Configure your load balancer to poll sys/health so that it always points to the leader node.
Set VAULT_ADDR to point to the local node, on each node. Having VAULT_ADDR=http://localhost:8200 in /etc/environment is one way to do that, ymmv.

I assume you can auto-unseal, but chances are that is impossible (no HSM support with the open source version, not on cloud, etc.).

If you can auto-unseal, set the leader_api_addr parameter of your storage stanza to the address of your load balancer.

At this point, your load balancer might be a little confused, trying to find a leader that does not exist. Since none of your nodes is initialized, it won't find it. Don't worry.

Initialize storage

Pick any node and initialize it using the command line. This step is important for security, but outside the scope of this (already quite broad) question.

Then, depending on your auto-unseal capabilities:

Let the other nodes auto-join

After that initial node is ready, your load-balancer will finally find a leader and send traffic to it. That means the remaining uninitialized nodes will be able to auto-join. They will automatically request a challenge from the leader node to prove they share the same unseal mechanism. Successfully solving the challenge will start replication. There is nothing to do but watch with vault operator raft list peers.

Manual unseal

If you can't auto-join, then you shouldn't have the retry_join stanza in your configuration. You must join the remaining nodes manually using vault operator raft join. Make sure you send the request to the local node, not the leader. You specify the leader address on the command line. Setting the VAULT_ADDR to your local node is important here. You could into issues with the certificate names, so you could also hardcode 127.0.0.1 as the IP for the hostname of your load-balancer in /etc/hosts. You could also have two entries in your load-balancer configuration: one that is round-robin (relying on Vault internal request forwarding) the other is always the leader. Use the round-robin address for day-to-day use, but use the "always-the-leader" address for Raft join tasks.