Resumability

The dsync tool supports resumability by default assuming connectors are implemented properly. Dsync uses multiple writers but can track write completion by relying on a special message that all writers must see and so when the last writer sees that message we will know that all writes prior to it have completed. Resumability is currently tracked by using a metadata database (currently supports MongoDB) which can be specified with the -m option.

                   METADATA DB
                        ^
                        |
                        v
SOURCE CONNECTOR <--> DSYNC <--> SINK CONNECTOR

Initial sync resumability is at a partition level. So to make this more clear, let's look at an example. GeneratePlan returned 20 partitions. We have completed 12 of these, and are almost completing 4 more. Then we interrupt the process and resume. We will have to restart the 4 that were almost complete from the beginning so we will resume with 8 more tasks to complete. Partition completion is tracked only after the last write (not read) of that partition's contents.

The updates/CDC resumability is based on the last saved next cursor and a periodic event. Every so often dsync will persist the last known next cursor by emitting an event and waiting on all writers to see that event before persisting so we can resume from that point if there is an interruption.

Last updated