Active/Active vs. Active/Passive MySQL Database Geo-Clusters

“When you’re millions of miles from home, it’s hard to install an operating system update - but not impossible. From the dawn of the Space Age through the present, NASA has relied on redundant hardware to make up for physical defects, wear and tear, sudden failures, or even the effects of cosmic rays.” - Glenn Fleishman, Author and Technology Journalist

While my team is yet to support an application that goes beyond the stratosphere, we have experience in other unique WAN environments such as for cruise liners and for the military; in other words, the rocketship-design approach to risk resonates. Continuent has been developing and providing software and support for business-critical and mission-critical MySQL and MariaDB for a long time (since 2004). Our goal is to remove the worry and manual overhead typically involved in ensuring MySQL availability, and our focus is geo-distributed or global MySQL databases - specifically, designing fully-managed clusters of clusters.

A Redundancy of Redundancies

For multi-site, multi-region, or geo-distributed MySQL/MariaDB clusters, organizations with business-critical MySQL databases like Saas and Fintech typically require what’s called a "composite" cluster, or a "cluster of clusters."

The managed composite “cluster of clusters” may span over:

  • geographic locations
  • cloud providers (multi-cloud)
  • on-premises hardware with cloud-based instances (hybrid-cloud)
  • along with any other combination you can think of ;-}

In a composite cluster, two or more managed clusters work together, specifically through three unique, synergistic processes:

  • Replication pipelines via Tungsten Replicator both within and between clusters are fully managed.
  • Cluster orchestration via Tungsten Managers is both locally and globally aware.
  • Application-Database connections are routed optimally via the intelligent Tungsten Proxies, which are also multi-site aware.

Active/Passive

You may choose the Active/Passive design, as shown in the graphic below, where all writes are directed to a single Primary.

In a simple Active/Passive clustering topology, the secondary cluster, also known as a “Disaster Recovery cluster” or “DR cluster” shown on the right has a “Relay” which may be promoted as the new Primary with a single command (or click with Tungsten Dashboard, the freely included GUI).

Let me repeat - it takes a single command to activate the DR site. Gracefully, without disruption to the service or lost connections, writes get rerouted from the Primary node to the Relay node in the Secondary cluster.

Some people ask why we do not allow automatic switches over to a different site or region. A human must trigger the switch by design - it is deliberate to prevent flapping. If this switch feature were automatic, any network blip might trigger a switch, and issues that require human attention to be resolved could result in repetitive switching back and forth between sites. So, while a site switch could easily be automated, by design it is not - and for good reason.

Active/Active

Heads of technology, infrastructure, site reliability and database operations at mid-to-large SaaS, Telco and Fintech organizations sometimes ask us for Active/Active multi-site clustering for their MySQL databases.

Active/Active means you may have two or more database nodes simultaneously serving writes. As shown in the graphic below, there are two or more Primaries serving writes.

This is great for applications with a high write-to-read ratio and applications serving a geographically-distributed user base. It brings Primary database servers physically closer to the end users and improves application responsiveness.

Naturally, the risk with Active/Active is the potential for data conflicts, especially in a high transaction environment. If, for example, the same row of data is updated at the same time in two sites - how does the database system decide in which order to commit the writes?

Active/Active versus Active/Passive

There are pros and cons to each multi-site or multi-region design - it depends on many factors, and our highly experienced team will discuss, help design and test yours. We actually have three ways to go:

  1. Composite Active/Passive (CAP) with one active site and one or more passive sites. The active site contains the lone Primary node accepting writes, and all other sites have a special Relay node, which does not accept client writes. If the primary site fails, the `failover`command must be manually issued to promote another site to be a primary.
  2. Composite Active/Active (CAA) with multiple active sites — if an active site fails, all traffic to that site is automatically forwarded to a different, available active site. This provides faster failover, but it is subject to potential conflicts due to having multiple Primaries.
  3. Composite Active/Active (CAA) configured to route all traffic to one active site — if the configured active site fails, all traffic is automatically routed to a different, available active site. This is also a faster failover, and is handled by the connectivity layer level.

If, as in the third option, your Active/Active Cluster behaves in an Active/Passive manner by only writing in one place at a time, you may automatically reroute traffic to the other site with no risk of conflict. This results in less administration for the cluster layer.

Whatever design you choose, all Continuent solutions are fully-integrated, fully-tested for mission-critical and business-critical use cases, and our team backs it up with 24/7 enterprise support. So to conclude this blog, let’s close with Glenn Fleishman: “NASA’s obsessive focus on software testing to find and remove bugs, plus a strategy to allow software to recover in the worst of circumstances...” is key to keeping a service going. It’s a lot of work - and not something most organizations running business-critical MySQL have the experience and expertise let alone time and money, to do any other way.

For more information, feel free to visit the resources below or contact us.

Additional Resources:

https://www.continuent.com/products/tungsten-clustering/subscriptions

https://www.continuent.com/resources/blog/dig-and-de-mystify-tungsten-cluster-topologies

https://www.continuent.com/resources/blog/mysql-terminology-changes-continuent-primary-and-replica-source-and-target

https://www.continuent.com/products/tungsten-clustering/topologies

https://www.continuent.com/products/tungsten-clustering

About the Author

Sara Captain
Customer Success Manager

Sara has worn various hats at Continuent since 2014. Listening to Continuent customers over the years, Sara fell in love with the Continuent Tungsten suite of products. She started learning Linux and MySQL administration with the support of Continuent's amazing team, so she can help with keeping Customers happy. Prior to Continuent she worked in consulting with a focus on leveraging data.

Add new comment