Lights, Camera, Action!
Setting the Stage with Parallel Apply
Say we have 8 MySQL schemas in a 3-node Tungsten Cluster, and the Tungsten Replicator is configured for 8 apply channels; one schema is hot and slow, one is very fast and the rest are "normal". All queries are guaranteed to be in a single schema only.
Question: Do the rest of the channels on the Replicas block on the hot/slow thread? If an apply thread is non-blocking, how far behind the fastest channel is the slow channel allowed to get?
Answer: Threads are non-blocking, but have controls in place to ensure that they cannot diverge too much. This means that when one thread is too far ahead of the others, it will stop and wait for the other threads to catch up. If a thread waits more than 60 seconds, it will resume processing the next event, then it will wait again, improving the chances for the slow thread to catch-up.
The Good, The Bad and the Ugly
When Parallel Apply Works Well and When It Does Not
When does parallel replication work the best?
- Data is stored in independent schemas. If you have 100 customers per server with a separate schema for each customer, your application is a good candidate.
- Events do not span schemas.
- The workload is well-balanced across schemas.
- The Replica host(s) are capable and have free memory in the OS page cache.
- The host on which the Replica runs has a sufficient number of cores to operate a large number of Java threads.
Parallel replication does not work well on underpowered hosts, such as Amazon m1.small instances. In fact, any host that is already I/O bound under single-threaded replication will typically not show much improvement with parallel apply.
When does parallel apply really not work well at all?
- One hot schema and many not - because all but one thread will sit idle
- Events that span schemas - Tungsten serializes such queries, which means that parallel apply stops to process that single event. If more than 2-3% of transactions are serialized, there is no benefit from using parallel replication.
Parallel Points of Interest
What are the Must-Know Facts?
Below are the most critical points: (answers are from the documentation verbatim)
- How does it work? Parallel apply works by using multiple threads for the final stage of the replication pipeline so that there are many writes at once.
- What are channels? The multiple apply threads are known as channels.
- Where is it tracked? Restart points for each channel are stored as individual rows in table
- How is it enabled and disabled? Use the
channels=Nconfiguration option with a value greater than 1 to enable parallel apply. Disable parallel apply with a value of 1, resulting in single-threaded operation.
- What is a required step? Never change the number of channels without taking the replicator offline gracefully.
How Many Channels To Use?
The Golden Rules of parallel apply are:
- Use the smallest number of channels possible
- Only use enough channels to consume the available disk i/o
- More channels than shards leads to slower performance, not faster
- Evenly distributed workloads often work very well where the quantity of channels equals the quantity of shards
- Unevenly distributed workloads work best with fewer channels, usually just one or two more than the number of hot shards.
- In a cluster, all nodes must have the same number of channels
Parallel Apply Tuning Tips
Internally, there is a key mechanism to prevent individual threads from falling too far behind if they are slow to apply. If a thread waits more than 60 seconds, it will process the next event, then wait again, giving the slow thread a chance to catch up.
There are currently two configuration options that allow for tuning parallel replication:
- property=replicator.store.parallel-queue.maxOfflineInterval=N (where N is the number of seconds)
Sets the maximum number of seconds for a clean shutdown. This is maintained by keeping the THL read tasks from getting too far apart from each other. Default: 5
- property=replicator.store.parallel-queue.syncInterval=N (where N is the number of events)
Sets the number of events to process before generating an automatic control event if sync is enabled. Default: 10000
Normally, these should not need to be adjusted. They exist for a reason, and every deployment is different.
Should you be in a situation where you are tuning parallel apply, feel free to get Continuent Support involved - we are very happy to assist with these efforts.
Less Is More
When using Parallel Replication, use only enough channels to barely saturate your available disk i/o, and use fewer channels for uneven workloads. More channels does NOT mean more performance!
For more information, please visit our online documentation: