Continuent Blog: Mastering Tungsten Replicator Series: Filtering the Data Fantastic

Blog

The Tungsten Replicator is an extraordinarily powerful and flexible tool capable of moving vast volumes of data from source to target.

In this blog post we will discuss the basics of how to implement and use filters in the Tungsten Replicator.

A Litte Background

As a brief refresher, please recall that all work in the replicator is handled by Stages, and every stage's workflow consists of three (3) actions, which are:

Extract: the source for extraction could be the mysql server binary logs on a master, and the local THL on disk for a slave
Filter: any configured filters are applied here
Apply: the apply target can be THL on disk on a master, and the database server on a slave

Since all Stages have a filter action, and filters are invoked on a per-stage basis, we can customize the data in-flight at any point in the pipeline just by specifying the Stage we desire to filter at.

Location, Location, Location

Filters can be enabled through the `tpm` command to target four different stages:

Master Stages

--repl-svc-extractor-filters (stage: binlog-to-q) - Apply the filter during the extraction stage, i.e. when the information is extracted from the binary log and written to the internal queue .
--repl-svc-thl-filters (stage: q-to-thl) - Apply the filter between the internal queue and when the transactions are written to the THL.

Slave Stages

--repl-svc-remote-filters (stage: remote-to-thl) - Apply the filter between reading from the remote THL server and writing to the local THL files on the slave.
--repl-svc-applier-filters (stage: q-to-dbms) - Apply the filter between reading from the internal queue and applying to the destination database.

As always, remove the leading two hyphens for use within the tungsten.ini configuration file.

For example, apply a filter on a master to not extract a specific database schema because it contains only temporary data:

repl-svc-extractor-filters=replicate
property=replicator.filter.replicate.ignore=mytempdb

Here is another example, this time apply a filter on a slave to exclude a specific database table because it contains only temporary data:

repl-svc-remote-filters=replicate
property=replicator.filter.replicate.ignore=myappdb.mytemptable

Perspective Counts

A filter that removes specific tables from a particular database would have different effects depending on the stage it was applied.

If the filter was applied on the master before writing the information into the THL (i.e. --repl-svc-extractor-filters), then no slave could ever access the table data, because the information would never be stored into the THL on the master to be transferred to the slaves.

However, if the filter was applied on the slave (i.e. --repl-svc-applier-filters), then some slaves could replicate the table and database information, while other slaves could choose to ignore them.

The filtering process also has an impact on other elements of the system. For example, filtering on the master may reduce network overhead, albeit at a reduction in the flexibility of the data transferred.

Get In Line!

Where more than one filter is configured, each filter is executed in the order it appears in the configuration. For example, in some heterogenous deployments you might see:

repl-svc-extractor-filters=settostring,enumtostring,pkey,colnames

settostring is executed first, followed by enumtostring, pkey and colnames. For certain filter combinations this order can be significant. Some filters rely on the information provided by earlier filters.

Wrap-Up

For more details about filtering, please visit the docs page at http://docs.continuent.com/tungsten-clustering-6.0/filters.html

We will continue to cover topics of interest in our next "Mastering Tungsten Replicator Series" post...stay tuned!

Click here for more online information about Continuent solutions...

Want to learn more or run a POC? Contact us.

Published In

Categories:

Advanced Replication, Database Administration, Cluster Management

Series:

Tungsten University

Tags:

Architecture, HA, High Availability, Maintenance, MySQL

Author

Eric M. Stone

COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 40 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

View All Eric M.’s Posts

Mastering Tungsten Replicator Series: Filtering the Data Fantastic