Verification
Dsync provides multiple verification options.
Using the dsync verify
command triggers the verification mode. By default this uses a Merkle Search Tree to load up all items on both connectors and compare their xxhash and print out any differences it sees. Since this can be memory intensive there is an option to specify the number of partitions. to filter out to only comparing 1/n. It takes time to load the initial items, and then it will also run indefinitely tailing the CDC updates.
Using the --simple
option will use a simple verifier that only tails the CDC updates of the 2 connectors. It periodically checks any "stale" updates and compares those. This relies on both data sources starting at an already stable state and basically ensures that updates on one side show up in the updates of the other. The first few differences may be discarded as it is possible when starting this verifier we just missed an update from one connector but got the update in the other. Any data that is updating fast enough won't be compared as the condition to compare is a stale update.
Legacy Verification
Specifying the --verify
option is a legacy feature that only works for some pairs of connectors. There are 2 modes supported here. By default this will use the source plan and assume that the sink is also a source and can understand the source plan (which is not guaranteed and actually dangerous). It will do an xxhash and xor it all across all items in a partition and compare.
--verify-quick-count
will simplify the above so that it does a query for the approximate number of documents and see if they are the same. As long as both connectors support GetNamespaceMetadata
this would work.
Last updated