Moving data in real-time into Amazon Redshift follow-up Questions

We had a really great session yesterday during the webinar on Amazon Redshift replication, but sadly ran out of time for questions. So as promised, let's try and answer the questions asked now!

Are you going to support SAP IQ as a target ?

This is the first request we've ever received, and so therefore there are no firm plans for supporting it. As a target, SAP IQ supports a JDBC interface, and therefore is not hugely complicated to achieve compared to more complex, customer appliers.

Can you replicate from SAP ASE ?

No, and no current plans.

What about latin1 character ? Is it supported ?

Yes, although the default mode is for us to use UTF-8 for all extract and apply jobs. You can certainly change the character set, although leaving UTF-8 should also work.

Is it a problem not to have primary keys on Redshift ?

Yes, and no. The replicator needs some way to identify a unique row within the target database so that we don't inadvertently corrupt the data. Because of the way the data is loaded, we also need to have enough information to be able to run a suitable materialise process, and that relies on having that primary key information to use to identify those rows. Obviously, primary keys provide us with the information we need to do that.

In a future release, we will support the ability to add an explicit custom primary key to tables, even if the source/target table doesn't have explicit primary keys. This means that if you can use multiple columns to uniquely identify a row, we can use this to perform the materialisation.

What if someone runs a truncate on the source ?

Currently, TRUNCATE is not supported as it is a DDL operation that is normally filtered. With the new DDL translation functionality however we have the ability to support this, and decide how it is treated. Support for TRUNCATE is not yet supported even in the new DDL translation support. In a future release we will process a TRUNCATE operation as we would any other DDL, and therefore choose whether it deletes or truncates the table, or creates a copy or archive version of the table.

Is it possible to filter out DML operations per table ? Forbid DELETEs on a given table.

Yes. We have a filter called SkipEventByType, which allows you to select any table and decide whether operations apply to it. That means that you can configure by table and/or schema whether you want to allow INSERT, UPDATE or DELETE operations.

Is replication multi-threaded ?

Yes. The replicator already handles multiple threads internally (so for example the reading of the THL and the applying are two separate threads). You can also configure the applier to handle data by multiple threads and this works with our 'sharding' system. This is not sharding of the data, but instead sharding of the THL stream within the replication process. To ensure we don't corrupt data that crosses multiple tables within transactions (the replicator is always transactionally consistent) the sharding is handled on a per-schema basis. So if you are replicating 10 schemas, you can configure 10 threads, and each thread will be handled and applied separately to the target. Also keep in mind that batch-based appliers like Redshift automatically handle multiple simultaneous targets and threads for each table, since each table is loaded individually into the target.

If you are condensing multiple streams (i.e. multiple MySQL or Oracle sources), then each stream also has it's own set of threads.

However, even with multiple threads, it's worth remembering that both the MySQL binary log and the Oracle Redo logs are sequential transaction logs, and so we don't extract with multiple threads because there is only one source of transaction changes, and therefore we only ever extract complete applied and committed transactions.

Hopefully this has answered all the questions (and some other information). But if you have more, please feel free to ask.

About the Author

MC Brown
Former VP of Products

MC Brown is a professional writer and technologist for over 25 years, an author and contributor to over 26 books covering a wide array of topics, and technical and architectural advisor on databases, cloud and grid computing, and operating system development. During his time he has worked for and with Sun, MySQL, Oracle, Couchbase, VMware, Microsoft and IBM, and written for O'Reilly, IBM, Computerworld, IBM developerWorks and many others.

Add new comment