The Tungsten Replicator is an extraordinarily powerful and flexible tool capable of moving vast volumes of data from source to target.
In this blog post we will discuss the basics of how to implement and use filters in the Tungsten Replicator.
A Litte Background
As a brief refresher, please recall that all work in the replicator is handled by Stages, and every stage's workflow consists of three (3) actions, which are:
- Extract: the source for extraction could be the mysql server binary logs on a master, and the local THL on disk for a slave
- Filter: any configured filters are applied here
- Apply: the apply target can be THL on disk on a master, and the database server on a slave
Since all Stages have a filter action, and filters are invoked on a per-stage basis, we can customize the data in-flight at any point in the pipeline just by specifying the Stage we desire to filter at.
Location, Location, Location
Filters can be enabled through the `tpm` command to target four different stages:
--repl-svc-extractor-filters (stage: binlog-to-q) - Apply the filter during the extraction stage, i.e. when the information is extracted from the binary log and written to the internal queue .
--repl-svc-thl-filters (stage: q-to-thl) - Apply the filter between the internal queue and when the transactions are written to the THL.
--repl-svc-remote-filters (stage: remote-to-thl) - Apply the filter between reading from the remote THL server and writing to the local THL files on the slave.
--repl-svc-applier-filters (stage: q-to-dbms) - Apply the filter between reading from the internal queue and applying to the destination database.
As always, remove the leading two hyphens for use within the
tungsten.ini configuration file.
For example, apply a filter on a master to not extract a specific database schema because it contains only temporary data:
Here is another example, this time apply a filter on a slave to exclude a specific database table because it contains only temporary data:
A filter that removes specific tables from a particular database would have different effects depending on the stage it was applied.
If the filter was applied on the master before writing the information into the THL (i.e. --repl-svc-extractor-filters), then no slave could ever access the table data, because the information would never be stored into the THL on the master to be transferred to the slaves.
However, if the filter was applied on the slave (i.e. --repl-svc-applier-filters), then some slaves could replicate the table and database information, while other slaves could choose to ignore them.
The filtering process also has an impact on other elements of the system. For example, filtering on the master may reduce network overhead, albeit at a reduction in the flexibility of the data transferred.
Get In Line!
Where more than one filter is configured, each filter is executed in the order it appears in the configuration. For example, in some heterogenous deployments you might see:
settostring is executed first, followed by
colnames. For certain filter combinations this order can be significant. Some filters rely on the information provided by earlier filters.
For more details about filtering, please visit the docs page at http://docs.continuent.com/tungsten-clustering-6.0/filters.html
We will continue to cover topics of interest in our next "Mastering Tungsten Replicator Series" post...stay tuned!
Want to learn more or run a POC? Contact us.