CPA ensures that as your cluster grows or shrinks, your auxiliary services maintain the right capacity—no more manual replica tuning.

How CPA Works
CPA continuously watches the size of your cluster—either the number of schedulable worker nodes or the total CPU cores—and computes desired replicas for each target deployment using a simple scaling formula.


Scaling Modes
CPA supports two scaling modes—ladder and linear—each suited to different workload patterns:
- Ladder Mode: Uses discrete thresholds for stepwise scaling—great for predictable, SLA-driven workloads.
- Linear Mode: Applies a continuous ratio between replicas and cluster size—ideal for elastic, usage-based scaling.
Ladder Mode Configuration
Define one or more thresholds to map CPU cores or nodes to replica counts:- 1–63 cores → 1 replica
- 64–511 cores → 3 replicas
- ≥512 cores → 5 replicas
- 1 node → 1 replica; ≥2 nodes → 2 replicas
Linear Mode Configuration
Linear scaling defines an exact ratio of cluster resources to replicas, along with bounds:| Field | Description |
|---|---|
coresPerReplica | CPU cores required per replica |
nodesPerReplica | Worker nodes required per replica |
min | Minimum number of replicas |
max | Maximum number of replicas |
preventSinglePointFailure | Ensures at least two replicas when enabled |
includeUnschedulableNodes | Include cordoned or tainted nodes in the resource calculation |
Enabling
includeUnschedulableNodes may cause CPA to count nodes that are cordoned or tainted, affecting your replica calculations.Replica Calculation
CPA computes replicas in three steps:Summary
- Ladder Mode: Stepwise thresholds for controlled scaling.
- Linear Mode: Continuous scaling with direct resource-to-replica ratios.
- CPA integrates with the Kubernetes API to automatically adjust replicas of auxiliary services based on cluster size.