Cisco Catalyst Center High Availability Guide, Release 2.3.7.x

A comprehensive guide for configuring and administering High Availability (HA) for Cisco Catalyst Center 2.3.7.x, including cluster deployment, node maintenance, and failure recovery procedures.

Quick answers from the manual

Quick answer

Catalyst Center HA requires a three-node cluster configuration to provide software and hardware redundancy. It supports database and security replication across nodes. p. 2

Key actions

Activate High Availability p. 4

First start

Configure nodes p. 4

Problems and fixes

Node failure

If failure persists > 5 minutes, check status. If hardware failure, contact Cisco TAC.

p. 11, 12

Maintenance and reset

Shut down all nodes p. 5

Technical specifications

Parameter	Value	Meaning	Pages
Cluster size	3 nodes	Required for quorum	p. 2

Where to find it in the PDF

High Availability Requirements p. 1
Cluster Administration p. 5, 6
Failure Scenarios p. 10, 11, 12

Table of contents

Quick Guide from the Manual

This guide outlines the requirements and procedures for implementing High Availability (HA) in a Cisco Catalyst Center environment. HA is designed to reduce downtime and ensure network resilience through a three-node cluster configuration.

Requirement: A three-node cluster is mandatory for quorum.
Network: All nodes must reside in the same network and site with a round-trip time (RTT) of 10ms or less.
Maintenance: Catalyst Center enters maintenance mode during upgrades and HA activation; plan accordingly.
Hardware: Appliances must have the same number of cores and run the same software version.

High Availability Overview

Catalyst Center's HA framework provides both software and hardware redundancy. It supports database and security replication (including X.509 certificates) across nodes. The system is designed to handle single-node failures automatically.

Deployment Recommendations

When deploying an HA-enabled cluster, follow these best practices:

Subnets: Use the default link-local subnets (169.x.x.x) or ensure custom subnets conform to RFC 1918 and 6598.
Interfaces: Keep cluster and enterprise traffic separate by using dedicated Cluster and Enterprise interfaces.
Network Links: Do not span a LAN across slow links, as this increases susceptibility to network failures.
Timing: Enable HA during off-hours, as the system will be unavailable while redistributing services.

Cluster Administration

Administrative tasks include shutting down, rebooting, and updating nodes. Important: You cannot simultaneously reboot or shut down two nodes in a three-node cluster, as this breaks the quorum requirement.

Shutting down all nodes: Run sudo shutdown -h now on all nodes simultaneously.
Shutting down one node: Use maglev node drain followed by sudo shutdown -h now.
Rebooting: Use sudo shutdown -r now.
RMA: Follow specific drain and removal procedures before replacing a failed node.

High Availability Failure Scenarios

Catalyst Center detects failures within 5 minutes. If a failure persists longer, user intervention may be required.

Node failure (< 5 mins): The system typically recovers automatically.
Node failure (> 5 mins): Services are migrated to other nodes; the GUI remains usable on remaining nodes.
Two nodes fail: The cluster breaks, and the GUI becomes inaccessible. Contact Cisco TAC.
Hardware failure: Replace the failed component (fan, power supply, disk drive) or the node itself.

Explanation of Pending State During a Failover

Pods in a 'Pending' state behave according to their type:

Stateful set: Node-bound using local persistent volume (LPV).
DaemonSet: Strictly node-bound.
Stateless/deployment: Can move across nodes based on cluster state, though some have node antiaffinity rules.

Manufacturer information

Cisco Systems, Inc.

Brand profile

Manufacturer website https://www.cisco.com Support https://www.cisco.com/c/en/us/support.html Manufacturer manuals https://www.cisco.com/c/en/us/support/docs

Practical help

Common problems

Cluster quorum lost

Ensure at least two nodes are operational. Do not shut down two nodes simultaneously.

Node failure

If the failure lasts longer than 5 minutes, check the GUI for status messages. If hardware failure is confirmed, contact Cisco TAC.

Maintenance mode

Catalyst Center is unavailable during upgrades and HA activation. Schedule these operations during off-hours.

Before use

Ensure three appliances with the same core count are available.
Verify all nodes are on the same network and at the same site.
Check that the round-trip time (RTT) is 10ms or less.
Ensure secondary appliances are running the same software version as the primary.
Prepare for maintenance mode downtime during HA activation.

Specs in practice

3-node cluster: The required configuration for quorum and HA operations.

Model compatibility

Does not support mixed clusters with third-generation appliances in version 2.3.7.5.
Does not support clusters with more than three nodes.
Does not support distribution of nodes across multiple networks or sites.

Show more Show less

Manual page author

David Miller

Documentation analyst

Organizes user manual content into clear summaries, with attention to model details, product context, and everyday usability.