Sync

A look behind what goes on during the sync process:

Planning

Before a brand new sync process starts, dsync creates the read plan. A read plan is simply a set of tasks that need to be executed with some additional metadata. One or more tasks per namespace. Dsync tracks the completion of individual tasks to calculate progress and to provide a point of resume in case the process gets interrupted.

Typically a read plan contains a set of tasks for initial data copy as well as the resume token for CDC.

Initial data copy

Initial Sync

During the initial data copy stage dsync bulk loads the data from the source to the destination. Depending on the source data partitioning (how many tasks per namespace) and the configured level of parallelism (how many tasks are copied in parallel), the process can be very fast.

Change stream

Change Stream

In order to capture data modifications made during and after the initial data copy process, a change stream mechanism is used to capture and track these modifications. Using resume tokens as a checkpoint for the last processed task allows for continuous synchronization and incremental data migration.

Verification

Verification

After the data migration, a data integrity check ensures that the data has been accurately and completely transferred.

Last updated