This is the second in this series, where we examine specific topics about how customers have benefited from moving from the Galera Cluster to the Tungsten Cluster. As we saw in the previous article, Tungsten Cluster uses asynchronous replication, which can provide more robust performance than the synchronous replication used by Galera Cluster. Now, let’s see how Tungsten Cluster leverages asynchronous replication to deploy MySQL clusters at geo-scale.
Suppose you require Disaster Recovery (DR) for your MySQL database infrastructure. If you’re using Galera, you may be tempted to do the following:
The application writes to a database in the West datacenter, with replication to (i) a local database and (ii) a remote one in the East datacenter. There are several issues with this topology, including intermittent loss of cluster quorum, but let’s focus on what happens during database writes with synchronous replication: Each time the application issues a commit to the database, it waits for the database to complete the commit.
For the database to complete the commit, the transaction must be replicated and acknowledged by other members in the cluster (feel free to reference the previous article to refresh your memory).
Acknowledgment from the local database could be relatively quick, assuming the LAN is robust. However, acknowledgement from the remote database will be slow - It could take a second or more. Every time the application writes, it must wait for a round trip acknowledgment from that remote database, making application performance extremely slow and unusable. There is no mitigation for this lag; it is a function of synchronous replication over long distances.
Database Clustering 101
If you search for a solution to this issue, you’ll find many people asking this question without any acceptable resolution. Eventually, you would arrive at the following architecture:
This architecture gives us high availability in both sites and local cluster quorum (but not quorum for Galera - that requires an additional component). However, replication needs to occur between sites. When using synchronous replication, we still have the same issues as described above, namely our applications are going to be painfully slow and will eventually break due to a backlog in the connection pool. Adding a third site will degrade performance even further.
Geo-Scale MySQL Done Right
In theory, you could deploy the above architecture with Galera and use native MySQL asynchronous replication for the WAN. At this point you would have two loosely coupled Galera clusters. You would have the following components to manage:
- the Galera clusters;
- the asynchronous replication stream between the clusters;
- a site failover/failback mechanism that you would need to write and integrate;
- a database proxy to route queries.
This is a complex DIY solution! You can get an idea of the complexity with this short video:
Fortunately, there is an easier way to realize the dream of MySQL Cluster at Geo-Scale. Tungsten Cluster contains all of the components needed to deploy Geo-Scale MySQL, along with both CLI and GUI management tools. By utilizing asynchronous replication, applications are no longer coupled to replication. Applications simply commit to the database while replication happens in the background, which allows for MySQL Clusters to be deployed in multiple sites as Active/Passive clusters (Great for Prod/DR), Active/Active clusters (all sites are active, the best for distributed applications), or a combination of the two.
What version of MySQL, MariaDB, or Percona Server should you choose for your high-availability database cluster, or will you be able to choose at all?