Zero Downtime: Upgrading MySQL Server in a Tungsten Cluster

Introduction

Tungsten Clustering allows for many types of maintenance to happen with no downtime at all.

This blog post will explore how to upgrade the actual MySQL Server on all cluster nodes with zero downtime.

Zero-Downtime Upgrade Steps

This is a zero-downtime upgrade:

  • Pick a replica node and shun it, and take all replication offline
  • Upgrade the MySQL Server on that node
  • Bring up replication and verify health
  • Welcome the replica node back into the cluster
  • Repeat for the other Replica node
  • Perform a switch of the Primary to one of the upgraded Replicas
  • Upgrade the old Primary with the same steps
  • Done!
NOTE
There is always a brief hang when a switch is performed and the Connectors re-route the client traffic to a different node. Additionally, a reconnect will be required when running the Connectors in Bridge mode.

The complete procedure is detailed here:

https://docs.continuent.com/tungsten-clustering-7.0/operations-maintenance-dataservice.html

Zero-Downtime Best Practices

The best practice is to leave the cluster in Automatic mode, because shunning a node takes care of preventing both failover to the node and Connector reads from it.

Using Maintenance mode degrades the service level, because in Maintenance mode, there is no high availability (HA/automatic failover) if the MySQL Server dies.

In Automatic mode, there is HA, because the Manager still provides quorum even though the node is shunned.

Take the Replicator offline gracefully using `trepctl offline` before taking down the database.

Bring the Replicator online and verify status using `trepctl online` and `trepctl services` or `trepctl status`.

IMPORTANT
The above procedure is NOT the correct way to upgrade the Tungsten software!

Bonus: Other Ways To Influence Cluster Behavior

Other possible ways to influence cluster behavior are the per-node Standby and Archive modes.

Standby mode allows failover to the specific node, but prevents Connector reads from it.

https://docs.continuent.com/tungsten-clustering-7.0/operations-status-changingstates.html#operations-status-changingstates-standby

Archive mode prevents failover to the node, but allows Connector reads from it.

https://docs.continuent.com/tungsten-clustering-7.0/operations-status-changingstates.html#operations-status-changingstates-archive

Mark a Datasource as Standby

To configure a datasource as a standby:

shell> cctrl
[LOGICAL:EXPERT] /alpha > datasource host3 standby
WARNING: This is an expert-level command:
Incorrect use may cause data corruption
or make the cluster unavailable.
Do you want to continue? (y/n)> y
DataSource 'db7-demo.continuent.com@north' is now OFFLINE
Datasource 'db7-demo.continuent.com' now has role 'standby'

To clear the standby state:

shell> cctrl
[LOGICAL:EXPERT] /alpha > datasource host3 clear standby
Datasource 'host3' now has role 'slave'

Mark a Datasource as Archive

To mark a datasource as an archive:

shell> cctrl
[LOGICAL:EXPERT] /alpha > datasource host3 set archive
Datasource 'db7-demo.continuent.com' is now an ARCHIVE slave

To remove the archive role:

shell> cctrl
[LOGICAL:EXPERT] /alpha > datasource host3 standby
Datasource 'db7-demo.continuent.com' is no longer an ARCHIVE slave

Wrap-Up

In this post we summarized the zero-downtime upgrade procedure for MySQL Server in a Tungsten Cluster. We also covered best practices for upgrading, along with some bonus information about the per-node Archive and Standby modes.

For more information, please see our online documentation:

Smooth sailing!

About the Author

Eric M. Stone
COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 35 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

Add new comment