Did you ever wonder just what the Tungsten Manager is thinking when it does an automatic failover or a manual switch in a cluster?
What factors are taken into account by the Manager when it picks a replica to fail over to?
This blog post will detail the steps the Manager takes to perform a switch or failover.
We will cover both the process and some possible reasons why that process might not complete, along with best practices and ways to monitor the cluster for each situation.
Roles for Nodes and Clusters
When we say “role” in the context of a cluster datasource, we are talking about the view of a database node from the Manager’s perspective.
These roles apply to the node datasource at the local (physical) cluster level, and to the composite datasource at the composite cluster level.
Possible roles are:
- a database node which is writable, or
- a composite cluster which is active (contains a writable primary).
- a read-only database node which pulls data from a remote cluster and shares it with downstream replicas in the same cluster.
- a read-only database node which pulls data from a local-cluster primary node, or from a local-cluster relay node for passive composite clusters;
- a composite cluster which is passive (contains a relay but NO writable primary).
Moving the Primary Role to Another Node or Cluster
One of the great powers of the Tungsten Cluster is that the roles for both cluster nodes and composite cluster datasources can be moved to another node or cluster, either at will via the
cctrl> switch command, or by having an automatic failover invoked by the Tungsten Manager layer.
Please note that while failovers are normally automatic and triggered by the Tungsten Manager, a failover can be also be invoked manually via the
cctrl command if ever needed.
Switch Versus Failover
There are key differences between the manual switch and automatic failover operations:
- Switch attempts to perform the operation as gracefully as possible, so there will be a delay as all of the steps are followed to ensure zero data loss.
- When the switch sub-command is invoked within cctrl, the Manager will cleanly close connections and ensure replication is caught up before moving the Primary role to another node.
- Switch recovers the original Primary to be a Replica.
- See https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch-manual.html.
- Failover is immediate, and could possibly result in data loss, even though we do everything we can to get all events moved to the new Primary.
- Failover leaves the original primary in a SHUNNED state.
- Connections are closed immediately.
- Use the
cctrl> recovercommand to make the failed Primary into a Replica once it is healthy.
- See https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch-automatic.html
For even more details, please visit: https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch.html.
Which Target Node to Use?
Picking a target replica node from a pool of candidate database replicas involves several checks and decisions.
For switch commands for both physical and composite services, the user has the ability to pass in the name of the physical or composite replica that is to be the target of the switch.
If no target is passed in, or if the operation is an automatic failover, then the Manager has logic to identify the 'most up to date' replica which then becomes the target of the switch or failover.
Here are the choices to pick a new primary database node from available replicas, in order:
- Skip any replica that is either not online or that is NOT a standby replica.
- Skip any replica that has its status set to ARCHIVE.
- Skip any replica that does not have an online manager.
- Skip any replica that does not have a replicator in either online or synchronizing state.
- Now we have a target datasource prospect...
- By comparing the last applied sequence number of the current target datasource prospect to any other previously seen prospect, we should eventually end up with a replica that has the highest applied sequence number. We also save the prospect that has the highest stored sequence number.
- If we find that there is a tie in the highest sequence number that has been applied or stored by any prospect with another prospect, we compare the datasource precedence and if there's a difference in this precedence, we choose the datasource with the lowest precedence number i.e. a precedence of 1 is higher than a precedence of 2. If there is a tie in precedence, select the last Replica chosen and discard the Replica currently being evaluated.
- After we have evaluated all of the Replicas, we will either have a single winner or we may have a case where we have one replica that has the highest applied sequence number, but we have another Replica that has the highest stored sequence number i.e. it has gotten the most number of THL records from the primary prior to the switch operation. In this case, and this is particularly important in cases of failover, we choose the Replica that has the highest number of stored THL records.
- At this point return to the switch or failover command whatever target replica we have chosen so that the operation can proceed.
After looping over all available Replicas, check the selected target Replica’s applied latency to see if it is higher than the configured threshold. If the appliedLatency is too far behind, do not use that Replica. The tpm option
--property=policy.slave.promotion.latency.threshold=900 controls the check, with 900 seconds as the default value.
If no viable Replica is found (or if there is no available Replica to begin with), there will be no switch or failover at this point.
For more details on automatic failover versus manual switch, please visit: https://docs.continuent.com/tungsten-clustering-7.0/manager-failover-internals-manual-switch-versus-automatic-failover.html.
Switch and Failover Steps for Local Clusters
In the upcoming Part 2 of this post, we will examine the steps needed to do a local failover.
For more details on switch and failover steps for local clusters, please visit:
Switch and Failover Steps for Composite Services
In the upcoming Part 3 of this post, we will examine the steps needed to do a composite site-level failover.
For more details on switch and failover steps for composite services, please visit:
Best Practices for Proper Cluster Failovers
What are the best practices for ensuring the cluster always behaves as expected? Are there any reasons for a cluster NOT to fail over? If so, what are they?
Here are three common reasons that a cluster might not failover properly:
- Policy Not Automatic
- BEST PRACTICE: Ensure the cluster policy is automatic unless you specifically need it to be otherwise.
- SOLUTION: Use the check_tungsten_policy command to verify the policy status.
- Complete Network Partition
- If the nodes are unable to communicate cluster-wide, then all nodes will go into a FailSafe-Shun mode to protect the data from a split-brain situation.
- BEST PRACTICE: Ensure that all nodes are able to see each other via the required network ports.
- SOLUTION: Verify that all required ports are open between all nodes local and remote
- SOLUTION: Use the check_tungsten_online command to check the DataSource State on each node.
- No Available Replica
- See “Which Target Node To Use?” above for the replica exclusion rules.
- BEST PRACTICE: Ensure there is at least one ONLINE node that is not in STANDBY or ARCHIVE mode.
- SOLUTION: Use the check_tungsten_online command to check the DataSource State on each node.
- BEST PRACTICE: Ensure that the Manager is running on all nodes.
- SOLUTION: Use the check_tungsten_services command to verify that the Tungsten processes are running on each node.
- BEST PRACTICE: Ensure all Replicators are either ONLINE or GOING ONLINE:SYNCHRONIZING.
- SOLUTION: Use the check_tungsten_online command to verify that the Replicator (and Manager) is ONLINE on each node.
- BEST PRACTICE: Ensure the replication applied latency is under the threshold, default 900 seconds.
- SOLUTION: Use the check_tungsten_latency command to check the latency on each node.
Command-Line Monitoring Tools
Below are examples of all the health-check tools listed above:
shell> check_tungsten_services -c -r CRITICAL: Connector, Manager, Replicator are not running shell> startall Starting Replicator normally Starting Tungsten Replicator Service... Waiting for Tungsten Replicator Service....... running: PID:14628 Starting Tungsten Manager Service... Waiting for Tungsten Manager Service.......... running: PID:15143 Starting Tungsten Connector Service... Waiting for Tungsten Connector Service....... running: PID:15513 shell> check_tungsten_services -c -r OK: All services (Connector, Manager, Replicator) are running
shell> check_tungsten_policy CRITICAL: Manager is not running shell> manager start shell> check_tungsten_policy CRITICAL: Policy is MAINTENANCE shell> cctrl cctrl> set policy automatic cctrl> exit shell> check_tungsten_policy OK: Policy is AUTOMATIC
shell> check_tungsten_latency -w 100 -c 200 CRITICAL: Manager is not running shell> manager start shell> check_tungsten_latency -w 100 -c 200 CRITICAL: db8=65107.901s, db9 is missing latency information shell> cctrl cctrl> cluster heartbeat cctrl> exit shell> check_tungsten_latency -w 100 -c 200 WARNING: db9 is missing latency information shell> cctrl cctrl> set policy automatic cctrl> exit shell> check_tungsten_latency -w 100 -c 200 OK: All replicas are running normally (max_latency=4.511)
shell> check_tungsten_online CRITICAL: Manager is not running shell> manager start shell> check_tungsten_online CRITICAL: Replicator is not running shell> replicator start shell> check_tungsten_online CRITICAL: db9 REPLICATION SERVICE north is not ONLINE shell> trepctl online shell> check_tungsten_online OK: All services on db9 are online
This blog post discussed the steps the Manager takes to perform a switch or failover, the best practices for ensuring proper failover, and possible solutions for monitoring the cluster health to ensure proper operation.