David Christianto
1 min readDec 4, 2020

--

It depends on each database. If you talk about binlog in MySQL, you will see the below information
- binlog filename
- binlog position
- GTID set

You may need this information if you want to change your current checkpoint into another checkpoint.

Is it the timestamp of the data so far synced to your data warehouse?
Yes, it is

As you may know, Data Lake uses timestamp ingestion as a folder structure. Previously, we use this timestamp information from the data itself, which was provided by other team's services. But, sometimes, in some cases, their services don't update the timestamp in the exact time. Maybe they need to buffer those data before ingesting it into their database. As a result, there is a change where the data would be late in the Data Lake.

We could not control this behavior across all teams, or other teams may not know or aware of this behavior. Therefore, we need a solution to change the value of timestamp ingestion so that we no need interception from other teams. Finally, we found & decided to use a native transaction time itself as our timestamp ingestion into Data Lake.

--

--

David Christianto
David Christianto

Written by David Christianto

Data Engineer, Mentor, and Data Ingestion Expertise | T-Shaped skills @ Bukalapak — Helping company grow their businesses & improve cost efficiency

No responses yet