Hi. Actually, I already listed the problems… | by David Christianto

1 min readDec 5, 2020

Actually, I already listed the problems that we faced in this article, so maybe I will try to copy paste the problems to this conversation.

Following are the problems that we found when we were using Debezium.

1. The Debezium process for snapshotting data from a table that has big data took a long time as its data grows. The process might take 2 or 3 days to finish a large table.

2. The connector kept restarting when we were deploying a new connector or were changing the connector config. The problem happened due to the Kafka Connect mechanism. Kafka Connect would attempt to rebalance the cluster. The cluster will stop all tasks, recompute to run all jobs, and then start everything again. This mechanism happened in Kafka Connect version 2.3 lower.

3. The connector failed due to a table schema mismatch. This problem occurred when the other team was executing changes to the table, such as the alter table, etc.

4. There were problems with the serializable process inconsistency. Ex. the boolean type could be TRUE or FALSE or 1 or 0. This problem occurred when we changed the DDL parser mode config.

5. There were problems with the checkpoint mechanism. Somehow, we often found data loss when the connector was failing or was restarting. Data loss is a big problem when we talk about Big Data.

Written by David Christianto

Responses (1)