AWS OpenSearch: Multi-AZ with Standby - Practical Trade-offs & Design Guidance
IntroductionMulti-AZ with standby in OpenSearch improves resilience but introduces complexity in real-world scenarios.
Strict Configuration Requirements- Requires 3 Availability Zones
- Minimum 2 replica shards
- Careful shard distribution across AZs
- Leads to higher storage and cost overhead
Capacity Planning Challenges- Standby is not idle
- Requires over-provisioning
- Failover load must be handled by active nodes
Shard Distribution Complexity- Poor distribution leads to hotspots
- Impacts performance and stability during failover
Failover Behavior- Node promotion delays
- Temporary performance degradation
- Shard rebalancing overhead
Cost vs Value- Higher infrastructure cost
- Increased storage and cross-AZ transfer
- Must justify based on business impact
When to Use Standby- Mission-critical applications
- Strict SLAs
- Revenue-impacting downtime
When Not to Use Standby- Internal tools
- Analytics workloads
- Non-critical environments
Final ThoughtAim for right-sized resilience, not maximum complexity.




