Blog

Tungsten Cluster Policies: Be Automatic

Introduction

In this short blog post, we examine the impact that setting the cluster policy to Maintenance has on the cluster behavior, and specifically the impact on the Tungsten Connector.

Policy: Explained

The cluster policy dictates how the cluster reacts when certain conditions are met.

The Tungsten Manager is constantly executing a set of health checks and reacts to the results of those tests based on rules.

The policy controls if the rules are fired or not.

RULESET AUTOMATIC MANUAL MAINTENANCE
Monitoring Yes Yes Yes
Fault Detection Yes Yes No
Failure Fencing Yes Yes No
Failure Recovery Yes No No

Policy: Automatic

In Automatic policy mode, or “automatic mode” for short, any rules that are triggered are acted upon.

When the cluster is in Automatic mode, failures of either the Primary or any Secondary are handled without human intervention.

In the case of a primary failure, that node is automatically shunned and the Primary role is switched to the most up to date Replica within the dataservice, which becomes the new Primary and any remaining Replicas are pointed to the newly promoted Primary.

When a Secondary fails, that node is temporarily removed from the dataservice, with application connections redirected to the other nodes in the dataservice. When the failed Secondary datasource becomes available, that node is automatically recovered for use by the dataservice.

One key behavior of note is that a Connector will pause all incoming requests if connectivity is lost with the Manager selected to provide status updates. In this situation, the Connector will broadcast for a new manager to connect to for status and will allow sessions to resume once the new connection is established. In Automatic mode, if a Connector cannot reach a Manager for status, it will appear to hang to calling clients.

The Manager on the cluster node marked as the Coordinator handles all automatic operations, when needed. If the current Coordinator node is lost, a new one is elected by the remaining Managers.

Policy: Maintenance

Maintenance mode should be used when administration or maintenance is required on the entire cluster, or you want automatic failover and recovery to be disabled.

In Maintenance policy mode, or “maintenance mode” for short, any rules that are triggered are ignored.

When the cluster is in Maintenance mode, failures of either the Primary or any Secondary are ignored and require human intervention.

To perform maintenance on just one node, the best practice is to remain in Automatic mode and simply SHUN that specific node. This procedure keeps automatic failover and recovery active, keeping the risk position lower.

To perform maintenance on the current Primary, the best practice is to remain in Automatic mode and execute a manual `switch` command inside the `cctrl` cli tool. This will move the Primary role to another node, allowing the old node to be shunned as it is now a Secondary after the switch.

One key behavior difference of note is that when a cluster is in Maintenance mode, a Connector will NOT pause all incoming requests if connectivity is lost with the selected Manager. This allows for safe Manager restarts and updates/upgrades without the calling application/clients being impacted.

All operations performed when the cluster is in Maintenance mode are signalled to all Connectors regardless of the policy.

Policy: Manual

In Manual mode, the cluster identifies and isolates datasources when they fail, but automatic failover (for Primary datasources) and recovery actions are disabled.

Wrap-Up

In this short blog post, we explored the impact that setting the cluster policy to Maintenance has on the cluster behavior, and specifically the impact on the Tungsten Connector.

If you are new to Tungsten, please feel free to learn more about Tungsten Clustering or reach out!

Online Documentation:

https://docs.continuent.com/tungsten-clustering-6.1/operations-policymodes.html

About the Author

Eric M. Stone
COO

Eric is a veteran of fast-paced, large-scale enterprise environments with 35 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

Add new comment