HashiCorp Certified: Vault Operations Professional 2022

Build Fault Tolerant Vault Environments

Configure a Highly Available Vault Cluster

In this guide, you’ll learn how to deploy a production-ready, highly available (HA) Vault cluster that offers redundancy, fault tolerance, seamless scalability, and fully replicated storage.

A single-node Vault server cannot tolerate failures or scale out. If the node goes offline, Vault becomes inaccessible—acceptable only for development or testing.

Warning

A single-node Vault instance has zero redundancy. Do not use this configuration in production.

The image illustrates a single-node Vault server architecture, highlighting its lack of redundancy, scalability, and failure tolerance, and includes a configuration file and storage backend. It is labeled as not a recommended architecture.

Key Characteristics of an Ideal Vault Cluster

A robust, production-grade Vault cluster should:

  • Replicate all data across every node
  • Survive one or more node failures without downtime
  • Scale horizontally as application demands grow
  • Maintain a fully replicated, peer-to-peer architecture

The image is a slide discussing the ideal characteristics of a cluster, emphasizing redundancy, failure tolerance, scalability, and a fully replicated architecture. It mentions the use of Integrated Storage over Consul for Vault Enterprise and notes that the Vault Operations Professional exam will not include Consul.

Note

Vault Enterprise supports two storage backends: Integrated Storage (Raft) and Consul. For production, HashiCorp strongly recommends Integrated Storage (Raft). The Vault Operations Professional exam covers only Integrated Storage.

Integrated Storage Overview

Integrated Storage (Raft) leverages the Raft consensus protocol to replicate data across Vault nodes. Every node retains a full copy of the data. The leader handles writes and streams updates to follower nodes. If the leader goes down, followers elect a new leader almost immediately.

The image illustrates a multi-node cluster using integrated storage, showing how data is replicated among nodes labeled A, B, and C, with Node B as the Raft leader. It includes a diagram of data replication paths and storage components.

Configuring Integrated Storage

Add a storage "raft" block to your Vault HCL configuration (config.hcl), then define your listener and seal settings:

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "node-a.hcvop.com"

  # Auto-join using AWS tags
  retry_join {
    auto_join = "provider=aws region=us-east-1 tag_key=vault tag_value=east-1"
  }
}

listener "tcp" {
  address         = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_disable     = 0
}

seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "12345678-abcd-1234-abcd-123456789101"
}

api_addr     = "https://vault.hcvop.com:8200"
cluster_addr = "https://node-a.hcvop.com:8201"
cluster_name = "vault-prod-us-east-1"
ui           = true
log_level    = "INFO"

Key settings:

  • path
    Local filesystem path for Raft data. Must be identical on every node.
  • node_id
    Unique identifier for each Vault instance.
  • retry_join
    Automates cluster formation. You can use cloud provider tags or explicit host addresses.

Explicit Retry Join Hosts

If you prefer manually listing each peer:

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "node-a.hcvop.com"

  retry_join {
    leader_api_addr = "https://node-b.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-c.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-d.hcvop.com:8200"
  }
  retry_join {
    leader_api_addr = "https://node-e.hcvop.com:8200"
  }
}

CLI Management for Integrated Storage

The image is a slide about managing integrated storage via CLI using the `vault operator raft` command, listing commands like `list-peers`, `join`, `remove-peer`, and `snapshot` with their functions. It has a yellow background and a cartoon character at the bottom right.

Use vault operator raft subcommands to manage your cluster:

CommandDescription
vault operator raft list-peersList current cluster members and their roles
vault operator raft join <leader>Add a new follower to the Raft cluster
vault operator raft remove-peer <id>Remove a node gracefully from the cluster
vault operator raft snapshotCreate or restore a cluster snapshot

Manual Cluster Workflow

  1. Initialize and unseal the leader (node A):
    vault operator init
    vault operator unseal
    
  2. On each follower (nodes B, C, …), join the cluster:
    vault operator raft join https://node-a.hcvop.com:8200
    
  3. If manual unseal is in use:
    vault operator unseal
    

    Note

    With auto-unseal (e.g., AWS KMS), Vault nodes unseal themselves after joining.

  4. Repeat steps 2–3 for each node until all peers have joined.

Viewing and Removing Peers

Log in and inspect your Raft peers:

vault login <root_token>
vault operator raft list-peers
NodeAddressStateVoter
node-a10.0.101.22:8201leadertrue
node-b10.0.101.23:8201followertrue
node-c10.0.101.24:8201followertrue
node-d10.0.101.25:8201followertrue
node-e10.0.101.26:8201followertrue

To remove a peer:

vault operator raft remove-peer node-e
# Peer removed successfully!

After removal, re-run vault operator raft list-peers to confirm.

A hands-on lab will walk you through both manual and automated cluster formation.

Watch Video

Watch video content

Previous
Section Overview Build Fault Tolerant Vault Environments