MySQL Binary Log Connector works in the Change Data Capture

In the previous article, I talked about Change Data Capture as a Gateway to the Big Data and Streaming Platforms. What is the CDC? What is the impact on business metrics? How does our team decide to build in-house our CDC project? How does our team migrate our CDC from Debezium to Gargantua?

In this article, I want to talk more about a few insights into Gargantua. I will use one of the Gargantua connectors for MySQL and explain some of the components. Gargantua MySQL connector is a connector to capture any data changes from the MySQL database by reading Binlog files (the MySQL transaction logs). Gargantua not only captures DML statements — insert, update, delete, commits, and rollbacks, but also afford to monitor DDL statements — create, alter, delete a table.

How does Binlog work in MySQL?

Binlog or binary log is a set of log files that contain information about data modifications made to a MySQL server instance. Basically, the Binlog file is used to enable the MySQL replication feature that allows a server — the master — to send all changes to another server — the slave — and the slave tries to apply all changes to keep up-to-date with the master. If you want to know more about the binary log works in MySQL, you may want to look at this documentation. In short, the replication works as follows:

  1. Whenever the master’s database is modified, the change is written to a binlog. The process is done by the client thread that executed the query that modified the database.

Please take note if you want to work with the CDC & Binlog files, you need to set up your MySQL server to enable the replication stream. You may want to look to this documentation for configuring the replication stream.

In the Binlog file, there are several event types. Each type has a different meaning & purposes. I will not talk about the detail of each event type, but instead, you could read more about Binlog event types through this official binlog-event-type. Each transaction log in MySQL is represented several events in a Binlog file, and I will talk more about it.

How are DML statements written in the Binlog file?

Figure 1. Write Rows Event in the Binlog file

In figure 1, you could see how does a typical transaction look like for a single write rows event. I will give you an example so you could understand it more easily. Example: There is a transaction that inserts new data into a particular table (Let’s say member table), then the following events will be created in the Binlog file.

  1. The GTID event. The beginning of a transaction.

Therefore, a single transaction with a single write rows event will produce five events in the Binlog file.

Figure 2. Update Rows Event in the Binlog file

In figure 2, you could see how does a typical transaction looks like for a single update rows event. Example: There is a transaction that updates some data in a member table, then the following events will be created in the Binlog file.

  1. The GTID event. The beginning of a transaction.

Therefore, a single transaction with a single update rows event will produce five events in the Binlog file.

Figure 3. Delete Rows Event in the Binlog file

In figure 3, you could see how does a typical transaction look like for a single delete rows event. Example: There is a transaction that deletes some data in a member table, then the following events will be created in the Binlog file.

  1. The GTID event. The beginning of a transaction.

Therefore, a single transaction with a single delete rows event will produce five events in the Binlog file.

Figure 4. Multiple Data Changes in the Binlog file

In figure 4, you could see how does a typical transaction looks like for multiple data changes. Example: There is a transaction that inserts new data into a cart table and updates some data in a transaction table, then the following events will be created in the Binlog file.

  1. The GTID event. The beginning of a transaction.

Therefore, a single transaction with multiple data changes will produce seven events in the Binlog file.

How are DDL statements written in the Binlog file?

Figure 5. Alter Table in the Binlog file

In figure 5, you could see how does a typical transaction looks like for altering a particular table. Example: There is a transaction that modifies schema for adding new columns in a member table, then the following events will be created in the Binlog file.

  1. The GTID event. The beginning of a transaction.

Therefore, a single transaction with an altered query will produce four events in the Binlog file.

How does Gargantua work with Binlog file?

In the previous section, we already talked about some of the behavior of Binlog events for each transaction type that you must capture in the CDC app. Next, I want to talk about how Gargantua handles those Binlog events. There are three main components in Gargantua.

  1. BinlogReader is an entry-point for accessing and managing the connector. The BinlogReader captures all monitored table information and updates the monitored table schema when there is a DDL statement. The BinlogReader store records in a queue and handles how to put & poll the records synchronously when there is a request both from BinlogClient & BinlogPusher.
Figure 6. Gargantua Workflow

In figure 6, you could see how the three Gargantua components work together in the big picture.

Figure 7. BinlogClient Reads & Handles Events Flowchart

In figure 7, you could see how does Gargantua read the Binlog file. The whole process runs in a Thread. The key points you should know are below.

  • The Gargantua records a GTID set and other information so that we could use those checkpoint information to rewind the Binlog events to where to the specific transaction later.
Figure 8. BinlogPusher Sends Data Changes Flowchart

In figure 8, you could see how does Gargantua sends the data changes into another store. The Gargantua spawns another Thread to schedule & run the above process for every period of time depends on the configuration. The key points you should know are below.

  • The Gargantua uses an event listener when sending all data changes so that you can implement your own code and target store such as Kafka, Google PubSub, Azure Event Hubs, etc. You also could decide how to send data on your own. It could be synchronous or asynchronous.

Conclusion

I already talked about how does Binlog works in MySQL and Gargantua work with the Binlog events. I also gave more details in the flowchart about how the Binlog events behavior for some transactions. I hope you could understand how the CDC works in MySQL and get more insight about it now.

Thank you for reading my article.

Data Engineering at Bukalapak, have the interest to become a solution architect