Recently a Continuent Customer Asked
“What is your approach for scaling up MySQL clusters for a higher number of queries per second and larger data sets?”
Scaling, like availability, is a multifaceted question.
In this blog post we will detail and discuss the best practices for scaling Tungsten Clusters.
Scaling Read Queries
Continuent scales clusters to handle more queries, both read and write, a number of ways.
For Read Query scaling, we can add nodes or entire clusters (i.e. a reporting cluster in a Active/Passive topology) and use the Tungsten Connector (aka Tungsten Proxy) to provide the load balancing (routing of reads to the replicas).
The Tungsten Connector intelligent proxy can automatically split queries, directing writes to the primary, and reads to any of the replicas, selecting the replica with the most up-to-date data by default. The reads can be distributed and load balanced across replicas using any of the the load balancing strategies built into the Connector. For applications that are Read/Write aware, the application can direct reads to a replica via a read-only port, hostname/ip address or SQL comment.
For more information, please visit the online documentation for Connector Routing Methods - https://docs.continuent.com/tungsten-clustering-6.1/connector-routing-types.html.
Scaling Write Queries
For scaling of the Writes, we can use Parallel Apply on the replicas, and reduce load on the primary by shifting reads to the replicas, adding more replicas as needed.
We also tend to recommend that the underlying disk is provisioned, has a very fast response time, and that we tune the filesystems appropriately to reduce I/O bottlenecks.
We also recommend tuning the application along with the DB indexes and queries. These efforts often prove to have a high value in squeezing more performance out of a system.
MySQL provides the “explain” facility to analyze queries and show where the query is spending most of its time. By using “explain,” you can identify which queries would benefit from additional indices, and also see which indices are not being used. Unused indices will incur a write penalty, so removing them will improve write performance and scalability.
Failing that, we scale the instance vertically, making it more powerful, and add provisioned iops disk for more i/o bandwidth.
To scale data sets, we use a “Pod” architecture when feasible to divide logically. For example, one of the world’s busiest MySQL-based marketing analytics solutions (multiple multi-terabyte databases with billions of new daily transactions) relies heavily on the pod architecture and parallel replication capabilities.
Using a Pod architecture also allows rollout of application changes to a subset of users, which allows for the collection of data points to gauge the impact of the changes, without affecting all customers.
In this blog post we described the best practices for scaling Tungsten Clusters.
As you can see, scaling is highly application and architecture-dependent.