Blog

How to Deal with Triggers in Your MySQL Database When Using Tungsten Replicator

Overview

Over the past few days we have been working with a number of customers on the best way to handle Triggers within their MySQL environment when combined with Tungsten Replicator. We looked at situations where Tungsten Replicator was either part of a Tungsten Clustering installation or a standalone replication pipeline.

This blog dives head first into the minefield of Triggers and Replication.

Summary and Recommendations

The conclusion was that there is no easy one-answer-fits-all solution – It really depends on the complexity of your environment and the amount of flexibility you have in being able to adjust. Our top level summary and recommendations are as follows:

If using Tungsten Clustering and you need to use Triggers:

  • Switch to ROW Based binary Logging, and either
    • Recode triggers to only fire when read_only=ON or based on user(), or
    • Use the replicate.ignore filter

If using Tungsten Replicator only, and you need to use Triggers:
If source instance is running in ROW based binary logging mode:

  • Drop triggers on target, or
  • Recode triggers to only fire when read_only=ON or based on user(), or
  • Use the replicate.ignore filter

If source instance is running in MIXED based binary logging mode:

Read on for more information on why we made these recommendations…

Deep Analysis

Running with ROW Based Binary Logging

Let’s create two simple tables, one for employees and one as an audit table. We’ll then create a trigger that will fire after an INSERT. Each trigger will write into the audit table, the id and employee name from the employee table, along with the action ‘INSERT’ and a timestamp.

CREATE TABLE employees
    ( employee_id    INT(6) PRIMARY KEY
    , first_name     VARCHAR(20)
    , last_name      VARCHAR(25) NOT NULL
    , hire_date      DATE NOT NULL
     ) ;
 
CREATE TABLE employee_audit
   (id              INT(6) AUTO_INCREMENT PRIMARY KEY,
    employee_id     INT(6),
    employeename    VARCHAR(50),
    recstate        CHAR(6),
    auditdate       DATE);
 
CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees
FOR EACH ROW
BEGIN
 
    INSERT INTO employee_audit (employee_id, employeename,recstate,auditdate)
     VALUES (NEW.employee_id,NEW.first_name,'INSERT',NOW());
 
END;

Our source database is in ROW based logging, and the triggers exist and are active on the slave.

Let’s insert a record into the employees table

INSERT INTO employees VALUES 
        (100, 'Steven', 'King', '2019-06-01');

All good on our master, but nothing on our slave and our replicator is in an error state.

pendingError           : Event application failed: seqno=50 fragno=0 message=java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY'
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000004:0000000000017892;-1
pendingErrorSeqno      : 50
pendingExceptionMessage: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY' Failing statement : INSERT INTO `sample`.`employee_audit` ( `id` , `employee_id` , `employeename` , `recstate` , `auditdate` )  VALUES (  ?  ,  ?  ,  UNHEX( ? )  ,  UNHEX( ? )  ,  ?  )

If we look at the THL we see we have extracted the INSERT on the employee table, but we have also extracted the INSERT on the audit table and this has come through as a complete transaction. When the INSERT on the employee table happens by the replicator, the trigger is firing and doing the INSERT on the audit table for us, but then the replicator is also trying to INSERT the same row.

SEQ# = 78 / FRAG# = 0 (last frag)
- FILE = thl.data.0000000001
- TIME = 2019-12-05 11:51:48.0
- EPOCH# = 0
- EVENTID = mysql-bin.000004:0000000000027015;-1
- SOURCEID = mysql01
- METADATA = [mysql_server_id=101;dbms_type=mysql;tz_aware=true;service=alpha;shard=sample]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00', ##charset = ISO-8859-1]
- SQL(0) =
 - ACTION = INSERT
 - SCHEMA = sample
 - TABLE = employees
 - ROW# = 0
  - COL(1: ) = 100
  - COL(2: ) = Steven
  - COL(3: ) = King
  - COL(4: ) = 2019-06-01
- OPTIONS = [foreign_key_checks = 1, unique_checks = 1, time_zone = '+00:00', ##charset = ISO-8859-1]
- SQL(1) =
 - ACTION = INSERT
 - SCHEMA = sample
 - TABLE = employee_audit
 - ROW# = 0
  - COL(1: ) = 1
  - COL(2: ) = 100
  - COL(3: ) = Steven King
  - COL(4: ) = INSERT
  - COL(5: ) = 2019-12-05

If we skip the error on the replicator then we rollback and loose the initial insert on the employee table too so now we have data discrepancy. If we DROP the trigger on the slave and bring the replicator online to retry, then everything goes through and tables match.

If for some reason we have no primary key on our audit table though, we wouldn’t have seen any error, and you would be fooled into thinking everything was ok, but in fact what you would end up with is doubling up of data, or depending upon the complexity of your trigger, data corruption of an even greater scale!

What if you need to have triggers on the slaves because your slave could become a MASTER at some point, or perhaps you have consistency checks and you need to ensure the entire structure matches? In this scenario the safest option is to add some simple checks in your Triggers. Typically, your slave databases should be in read only mode, therefore a simple test could check and only execute the statements within the trigger if the database is read/write.

However, this could be flawed if you are only replicating into your target, a subset of data for example, and in fact you need the target to be read/write for applications that perhaps work with other schemas. In this instance you could do a check in the trigger for the user that caused the trigger to fire. You could stop the trigger firing if the value of user() is the account you configured the replicator to use, ie tungsten, then you know the trigger will only fire when the initial call is a genuine insert, not as a result of the replicator applying the data, this could look something like the following:

CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees
FOR EACH ROW
BEGIN
 
    IF user() != 'tungsten@db1' THEN
     INSERT INTO employee_audit (employee_id, employeename,recstate,auditdate)
     VALUES (NEW.employee_id,CONCAT(NEW.first_name,' ',NEW.last_name),'INSERT','2019-12-05');
    END IF;
 
END;

However, what happens if you have hundreds of tables or very complex triggers? – this would be a lot of coding.

Sadly, there is no simple answer, the three options above need to be assessed and the right course of action taken to suit your environment!

Running with MIXED Based Binary Logging

Let’s now look at what happens when we are in MIXED logging mode

Using the same structure and same trigger code, we see the same result because as we’re in MIXED mode, MySQL has decided to log the event as a ROW event, so the same situation arises as before.

It logged the entire transaction as ROW because the trigger code was non-deterministic due to the use of the now() function and also because the table has an auto_increment column.

MySQL’s decision making on whether to switch between ROW or STATEMENT when in MIXED mode has a number of conditions, more detail on those rules at the link below, but specifically I want to call out this line:

“When one or more tables with AUTO_INCREMENT columns are updated and a trigger or stored function is invoked..”

Taken from https://dev.mysql.com/doc/refman/8.0/en/binary-log-mixed.html

So let’s change our table to force MySQL to NOT switch to ROW based logging by removing the AUTO_INCREMENT and removing the now() from our code:

CREATE TABLE employee_audit
   (id              INT(6) PRIMARY KEY,
    employee_id     INT(6),
    employeename    VARCHAR(50),
    recstate        CHAR(6),
    auditdate       DATE);
 
CREATE TRIGGER trgInsEmployees AFTER INSERT ON employees
FOR EACH ROW
BEGIN
 
    INSERT INTO employee_audit (id,employee_id, employeename,recstate,auditdate)
     VALUES (NEW.employee_id, NEW.employee_id,NEW.first_name,'INSERT','2019-12-05');
 
END;

Now let’s run our insert and see what happens…

This time, the initial INSERT is logged as a statement, and in STATEMENT mode, MySQL does NOT log the data changes as a result of a trigger firing, therefore we don’t replicate them either.

The THL shows this:

SEQ# = 110 / FRAG# = 0 (last frag)
- FILE = thl.data.0000000001
- TIME = 2019-12-05 12:25:52.0
- EPOCH# = 79
- EVENTID = mysql-bin.000005:0000000000010296;17
- SOURCEID = mysql01
- METADATA = [mysql_server_id=101;dbms_type=mysql;tz_aware=true;service=alpha;shard=sample]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0, foreign_key_checks = 1, unique_checks = 1, sql_mode = 'NO_ENGINE_SUBSTITUTION,NO_AUTO_CREATE_USER,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ZERO_DATE,NO_ZERO_IN_DATE', character_set_client = 33, collation_connection = 33, collation_server = 8]
- SCHEMA = sample
- SQL(0) = INSERT INTO employees VALUES
        ( 100
        , 'Steven'
        , 'King'
        , '2019-06-01'
        )

In this situation you need to have the trigger enabled on the target otherwise you will end up missing data!!

If you have a mix of triggers that will cause MySQL to behave differently each time, MIXED mode could cause a lot of confusion and data drift/corruption if not handled with care.

Using the check against user() or read_only in this case won’t help either because you may need the trigger to fire as a result of the replicators preceding insert, sadly there is no way to know if the trigger is fired as a result of a ROW or STATEMENT based action.

When in MIXED mode there really is no safe option unless you are 100% confident that all of your triggers would be non-deterministic and result in a ROW based binlog entry.

Can the replicator help?

Sadly not! Because MySQL doesn't flag in the binlog whether the DML is the result of a trigger firing, and doesn't log it at all in some situations, we have no way of making a judgement on what to do.

However, one final option that I haven’t covered, could be to use the replicator filters. This would only help if the tables affected by your triggers are solely maintained as a result of the triggers code. In these cases you could consider excluding them from replication by using the replicate.ignore filter

This would ensure no changes associated with these tables are applied and you would rely 100% on the triggers in the target database to maintain them, however, for non-deterministic statements you could still end up with data differences between these tables in your source and target!!

The Wrap-Up

In this blog post we discussed the correct way to Triggers with Tungsten Replication between MySQL databases.

Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit https://www.continuent.com/solutions

Want to learn more or run a POC? Contact us.

About the Author

Chris Parker
Customer Success Director, EMEA & APAC

Chris is based in the UK, and has over 20 years of experience working as a database administrator. Prior to joining Continuent, Chris managed large-scale Oracle and MySQL deployments at Warner Bros., BBC, and prior to joining the Continuent Team, he worked at the online fashion company, Net-A-Porter.

Add new comment