Continuent Blog: Make It Faster: Improving MySQL Write Performance for Tungsten Cluster Replicas

Blog

Overview

The Skinny

In this blog post we explore various options for performance tuning MySQL server for better Replicator applier performance on the replica nodes.

A Tungsten Cluster relies upon the Tungsten Replicator to move events from the primary node to the replicas. Once the event has been transferred to the replica as THL on disk, the applier on the replica will then attempt to write it to the database. Tungsten Replicator can only apply events as fast as MySQL allows. If the MySQL server is somehow slow or blocking, then the Replicator will be as well.

A properly-tuned database server in addition to infrastructure and SysAdmin best practices will go quite a long way towards high-performance apply operations on replica nodes.

The Question

Recently, a customer asked us:

During one of our load tests, we had a peak of 60k writes/min, averaging around 40k w/m. During this period, we saw an applied latency of 100-160 seconds. Is it possible to improve the replication latency on the replica nodes?

The Summary

Where To Look and What For?

When performance tuning, we normally examine the configuration and logs for the following layers:

Tungsten Replicator
MySQL server
Storage
Network/Firewall
Hardware & OS

The Rules

What are Performance Tuning Best Practices?

Before we dive into the various layers, let's review the performance tuning basics that I have learned over the years:

Performance tuning is a slow, iterative process.
Change one thing at a time and test carefully.
Document your changes and the associated test results.
Go for the low-hanging fruit early on. Sometimes the smallest changes may have the largest results, i.e. adding a proper index on a table.
As soon as you remove one bottleneck, the next one will appear. Iterate!
Divide and Conquer - pick a spot and ensure everything is working properly on both sides. This approach often helps to isolate issues.
Start at the top and work your way down the layers. Test, test, test!
Start at the bottom and work your way up the layers. Test, test, test!
When in doubt, ask, then document.
Read the documentation. Then read it again.
Ensure consistency amongst all nodes at every layer.
Most of all, do not assume.

Tungsten Replicator

The Short Version

We always start off by checking the replicator applier stages, of which there are three.

On a replica node, if the appliedLatency of the remote-to-thl stage is low, but the appliedLatency of both the thl-to-q and q-to-dbms stages are high, then the issue is almost always with the MySQL server somehow.

For example, on a healthy cluster:

On a Primary Node:

shell> trepctl status -name tasks | egrep 'applied|stage'

appliedLastEventId	: mysql-bin.000046:0000000065129638;-1
appliedLastSeqno  	: 2656221
appliedLatency    	: 0.279
stage             	: binlog-to-q
appliedLastEventId	: mysql-bin.000046:0000000065129638;-1
appliedLastSeqno  	: 2656221
appliedLatency    	: 0.279
stage             	: q-to-thl

On a Replica Node:

shell> trepctl status -name tasks | egrep 'applied|stage'

 appliedLastEventId	: mysql-bin.000046:0000000065077978;-1
appliedLastSeqno  	: 2656191
appliedLatency    	: 0.345
stage             	: remote-to-thl
appliedLastEventId	: mysql-bin.000046:0000000065077978;-1
appliedLastSeqno  	: 2656191
appliedLatency    	: 0.371
stage             	: thl-to-q
appliedLastEventId	: mysql-bin.000046:0000000065077978;-1
appliedLastSeqno  	: 2656191
appliedLatency    	: 0.374
stage             	: q-to-dbms

For more information about stages, please visit the following blog post: Mastering Tungsten Replicator Series: Understanding Pipelines and Stages.

Quick Tech "Tool Tip"

Here is a handy command to describe the summary progress of each stage on that node, refreshing every two seconds:

trepctl perf -r 2

MySQL Server

What a Lovely Tune

Each one of these suggestions can bear performance fruit. As always, your mileage may vary!

For this specific customer, we identified four (4) "hot" tables in the slow query logs, which turned out to have 2 million or more rows each. This asks the database to perform some i/o-intensive operations. Once the issue was clearly localized to the four tables, the scope narrowed somewhat, implying an indexing, query or disk i/o-based bottleneck.

Slow Query Log To The Rescue
The first thing to do would be to enable the slow query logs just to validate that there is nothing blocking faster queries behind it. Since the applier is single-threaded, just one repeated slow query can create a huge bottleneck.
To control the slow query log, use the below queries:
- Set global system variable slow_query_log to 0 to disable the log or to 1 to enable it:
  - mysql> SELECT @@global.slow_query_log;
  - mysql> SET GLOBAL slow_query_log = 0;
  - mysql> SET GLOBAL slow_query_log = 1;
- Set global system variable slow_query_log_file to specify the name of the log file:
  - mysql> SELECT @@global.slow_query_log_file;
```
mysql> SELECT @@global.slow_query_log_file;
                        +---------------------------------------+
                        | @@global.slow_query_log_file          |
                        +---------------------------------------+
                        | /volumes/data/mysql/db1-demo-slow.log |
                        +---------------------------------------+
                        1 row in set (0.00 sec)
                        
```
  - mysql> SET GLOBAL slow_query_log_file = "/volumes/data/mysql/db1-slow-query.log";
Next, SHOW FULL PROCESSLIST on the replicas during the latency would be helpful to identify what may be locking or blocking.
Use the MySQL EXPLAIN command to better understand why queries are slow.
Does every table have a Primary Key?
What type of replication is enabled in MySQL - STATEMENT, MIXED or ROW? For active/passive clusters, we recommend MIXED, and for active/active clusters, use ROW to ensure data integrity across masters. ROW will increase latency due to the sheer volume of data transmitted as compared to STATEMENT or MIXED.
Make sure there are no redundant indexes.
Also, check for the use of Unique indexes, which disable the InnoDB change buffering performance boost. InnoDB has change buffering (previously called the insert buffer), which is a feature to delay building secondary indexes that are not unique, and merge writes. It can boost insert performance by quite a lot, and it’s enabled by default.
Are the tables InnoDB? If so, do you have lots of memory? Can you load the tables into RAM and then sync to disk? This could help to avoid the physical SERIAL disk i/o that can create bottlenecks, even if the i/o channel itself is not full.
Finally, how large are the affected tables and schemas in terms of row size and byte size? Can the tables be split? Can the databases be split?

Key MySQL my.cnf Configuration Parameters to Check:

Storage

Disk Inside and Out

Disk I/O is normally the slowest part of any compute workflow because disk is often much slower than physical memory. No matter how fast the underlying SSD is, there are physical limitations to overcome, especially with SAN and Cloud-based storage.
For example, when using AWS EC2, consider using enhanced EBS I/O with provisioned IOPS if you are not already doing so, which would provide faster performance at an increased cost.
Not having separated volumes for different purposes will always degrade performance because of disk contention. For example, ensure separate volumes for (at least) root, mysql data, mysql binlogs and THL files.
Then stripe each filesystems for MySQL over multiple volumes using RAID of some sort (i.e. RAID-0), so that there are multiple I/O channels in use at the same time for the same filesystem.
This layered disk architecture provides for a large number of parallel disk I/O channels, giving a much higher throughput at a much lower latency.
What is the filesystem type? (i.e. xfs, ext4, etc...) because journaling filesystems are very slow. Consider using a non-journaling file system, or disabling journaling. We suggest using xfs.
Mount the filesystem using the -noatime option.

Network/Router/Firewall

The Network Really IS the Computer

Ensure sufficient bandwidth, and a very low error rate.
Confirm the various hops have sufficient cpu, ram and bandwidth.
Is the firewall able to keep up?

Operating System

SysAdmins to the Rescue, As Usual

First, verify the documented prerequisites for Tungsten hosts.

Check the number of open files (nofiles) and the max user processes (nproc):

tungsten@db1-demo:/home/tungsten # cat /etc/security/limits.conf
        *	-	nofile	65535
        *	-	nproc	8096
        tungsten@db1-demo:/home/tungsten # ulimit -n
        65535
        tungsten@db1-demo:/home/tungsten # ulimit -u
        8096
        tungsten@db1-demo:/home/tungsten # ulimit -a
        core file size          (blocks, -c) 0
        data seg size           (kbytes, -d) unlimited
        scheduling priority             (-e) 0
        file size               (blocks, -f) unlimited
        pending signals                 (-i) 31374
        max locked memory       (kbytes, -l) 64
        max memory size         (kbytes, -m) unlimited
        open files                      (-n) 65535
        pipe size            (512 bytes, -p) 8
        POSIX message queues     (bytes, -q) 819200
        real-time priority              (-r) 0
        stack size              (kbytes, -s) 8192
        cpu time               (seconds, -t) unlimited
        max user processes              (-u) 8096
        virtual memory          (kbytes, -v) unlimited
        file locks                      (-x) unlimited

Is there enough CPU?
Is there enough RAM?
Check the various system logs for any errors or warnings.
If using systemd, check the systemctl command for errors or warnings from installed components:
- systemctl status tconnector
- systemctl status tmanager
- systemctl status treplicator

Advanced Tungsten Replicator

The Parallel (Apply) Universe

After making sure that all reasonable efforts have been made to properly evaluate the above solutions, there is an advanced feature available to try: Parallel Apply.

Tungsten Replicator by default uses a single-threaded applier, so it can get about 10,000 updates per second maximum, depending on the round trip time, and so on. Since MySQL server is multi-threaded, the primary is able to write faster than the replicas.

To increase that limit, Tungsten Replicator offers Parallel Apply, which employs multiple replication apply threads, one per shard.

By default, we shard by database, with one shard created per database schema. If there is a cross-database query, all other threads block until that one completes, slowing performance down dramatically.

This means Parallel apply is best suited for environments that equally busy writes for every database. Having many databases, but only one or two as hot defeats the design and purpose of Parallel Apply.

Again, any cross-shard query will force the Replicator back into single-threaded mode to ensure data integrity, with the result of having no performance gain, or even degradation as now the Replicator has to keep switching modes.

Tungsten Replicator can also shard by table, with the same caveats, but this time cross-TABLE queries will block other threads, making this somewhat less useful than desired based on most query designs.

Important questions before implementing Parallel Apply:

Do you have just one database schema or multiple?
If single, do the bulk of your queries cross tables or use single tables?
If multiple databases, are they de-coupled or do you do a bunch of cross-database queries?

The Answer

What WAS the Solution After All?

For this customer, the following changes improved performance to acceptable levels:

The THL was being written to the same location as the MySQL binary logs, creating contention. Moving the THL directory to a different location (/logs) improved performance.
Because the nodes are AWS EC2 instances, converting volumes from GP2 to Provisioned IOPS allowed the replica to keep up with replication much better.

Several MySQL configuration values were modified, which also decreased the applied latency on the replica nodes:

innodb_io_capacity=1000 (Interestingly, lowering this from 10k to 1k led to a significant improvement in latency, despite the EBS volume being provisioned for 10k iops)
        innodb_io_capacity_max=2000
        innodb_adaptive_hash_index=0
        innodb_lru_scan_depth=2048
        performance_schema=0

Bonus Performance Tuning Tip

Tungsten versions >= 7.0.0 have a new tpm option replica-tx-commit-level which allows you to adjust the innodb_flush_log_at_trx_commit value of the replica nodes, potentially resulting in up to 75% gains. By setting the MySQL variable innodb_flush_log_at_trx_commit=2, MySQL does not need to flush to disk upon each write, instead passing that responsibility to the operating system sync operation. There is a small window for data loss in the event of a host crash, but if you deem that risk acceptable on a Replica, you can get a huge boost in performance. This feature will do this for you automatically if enabled, and ensure that the value is set to 1 on the Primary node.

Valid values are either 1 (default) or 2. Setting this value to 2 will change the underlying database property innodb_flush_log_at_trx_commit to 2 when the node becomes a Replica, resetting it back to 1 if the node is promoted to a Primary.

The Library

Please read the docs!

For more information about how to use the Tungsten Replicator, please see Mastering Tungsten Replicator Series: Command Line Superpowers.

For more in-depth discussion about MySQL database tuning, here are some excellent blog posts by Morgan Tocker:

For more technical information about Tungsten clusters, please visit our documentation.

Summary

The Wrap-Up

Clearly, there are many, many things to think about when it comes to MySQL performance tuning - this blog post barely scratches the surface of the subject. Remember, performance tuning is all about iterative effort over time!

Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit our products page.

Want to learn more or run a POC? Contact us.

Published In

Categories:

Database Administration, Performance

Series:

MySQL High Availability (HA) & Disaster Recovery (DR)

Tags:

cctrl, composite cluster, High Availability, MySQL Performance Tuning

Author

Eric M. Stone

COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 40 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

View All Eric M.’s Posts

Make It Faster: Improving MySQL Write Performance for Tungsten Cluster Replicas