In this short blog post, we examine the impact that setting the cluster policy to Maintenance has on the cluster behavior, and specifically the impact on the Tungsten Connector.
The cluster policy dictates how the cluster reacts when certain conditions are met.
The Tungsten Manager is constantly executing a set of health checks and reacts to the results of those tests based on rules.
The policy controls if the rules are fired or not.
In Automatic policy mode, or “automatic mode” for short, any rules that are triggered are acted upon.
When the cluster is in Automatic mode, failures of either the Primary or any Secondary are handled without human intervention.
In the case of a primary failure, that node is automatically shunned and the Primary role is switched to the most up to date Replica within the dataservice, which becomes the new Primary and any remaining Replicas are pointed to the newly promoted Primary.
When a Secondary fails, that node is temporarily removed from the dataservice, with application connections redirected to the other nodes in the dataservice. When the failed Secondary datasource becomes available, that node is automatically recovered for use by the dataservice.
One key behavior of note is that a Connector will pause all incoming requests if connectivity is lost with the Manager selected to provide status updates. In this situation, the Connector will broadcast for a new manager to connect to for status and will allow sessions to resume once the new connection is established. In Automatic mode, if a Connector cannot reach a Manager for status, it will appear to hang to calling clients.
The Manager on the cluster node marked as the Coordinator handles all automatic operations, when needed. If the current Coordinator node is lost, a new one is elected by the remaining Managers.
Maintenance mode should be used when administration or maintenance is required on the entire cluster, or you want automatic failover and recovery to be disabled.
In Maintenance policy mode, or “maintenance mode” for short, any rules that are triggered are ignored.
When the cluster is in Maintenance mode, failures of either the Primary or any Secondary are ignored and require human intervention.
To perform maintenance on just one node, the best practice is to remain in Automatic mode and simply SHUN that specific node. This procedure keeps automatic failover and recovery active, keeping the risk position lower.
To perform maintenance on the current Primary, the best practice is to remain in Automatic mode and execute a manual `switch` command inside the `cctrl` cli tool. This will move the Primary role to another node, allowing the old node to be shunned as it is now a Secondary after the switch.
One key behavior difference of note is that when a cluster is in Maintenance mode, a Connector will NOT pause all incoming requests if connectivity is lost with the selected Manager. This allows for safe Manager restarts and updates/upgrades without the calling application/clients being impacted.
All operations performed when the cluster is in Maintenance mode are signalled to all Connectors regardless of the policy.
In Manual mode, the cluster identifies and isolates datasources when they fail, but automatic failover (for Primary datasources) and recovery actions are disabled.
In this short blog post, we explored the impact that setting the cluster policy to Maintenance has on the cluster behavior, and specifically the impact on the Tungsten Connector.