# Verification

Dsynct provides several commands for verifying that data matches between source and destination. All verification commands run in simple mode (`DSYNCT_MODE=simple`). These features are currently experimental.

## verify

The `verify` command performs a full verification by reading all data from both the source and destination connectors in parallel and comparing them. It supports both initial sync verification and ongoing change stream verification.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify \
--namespace <NAMESPACE> \
<SOURCE> <DESTINATION>
```

With transformations (source-side and/or destination-side):

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct verify \
--namespace <NAMESPACE> \
--src-transform \
<SOURCE> <DESTINATION> dsync-transform://transform.yaml
```

| Flag                    | Required | Description                                                                                                     |
| ----------------------- | -------- | --------------------------------------------------------------------------------------------------------------- |
| `--namespace`           | No       | Source namespace(s). Can be specified multiple times.                                                           |
| `--dst-namespace`       | No       | Destination namespace(s). Defaults to source namespaces (or mapped namespaces).                                 |
| `--namespace-mapping`   | No       | Namespace mapping from source to destination.                                                                   |
| `--parallelism`         | No       | Number of parallel workers. Default: `1`.                                                                       |
| `--src-transform`       | No       | Set if a source-side transformer is provided after the two connectors.                                          |
| `--dst-transform`       | No       | Set if a destination-side transformer is provided (after the source transformer if present).                    |
| `--src-data-type`       | No       | Source data type. Inferred if not set.                                                                          |
| `--dst-data-type`       | No       | Destination data type. Inferred if not set.                                                                     |
| `--transform-data-type` | No       | Intermediate comparison data type. Default: `DATA_TYPE_MONGO_BSON`.                                             |
| `--skip-initial-sync`   | No       | Skip initial sync verification.                                                                                 |
| `--skip-change-stream`  | No       | Skip change stream verification.                                                                                |
| `--latency`             | No       | Only compare documents that have not been updated for this duration during change stream mode. Default: `20s`.  |
| `--report-interval`     | No       | How often to print progress reports. Default: `1s`.                                                             |
| `--report-limit`        | No       | Maximum number of mismatches to report per interval. Default: `5`.                                              |
| `--report-all`          | No       | Report all mismatches instead of limiting.                                                                      |
| `--projection`          | No       | JSON describing which fields to include in comparisons (e.g. `{"field": {"inner_field": true}}`).               |
| `--id-key`              | No       | Field name(s) that make up the document ID. Can be specified multiple times for composite keys. Default: `_id`. |
| `--partition`           | No       | Partition number (0-indexed) for distributed verification. Default: `0`.                                        |
| `--total-partitions`    | No       | Total number of partitions for distributed verification. Default: `1`.                                          |
| `--mapping-delimiter`   | No       | Delimiter for namespace mappings. Default: `:`.                                                                 |

## sample-ids

The `sample-ids` command samples document IDs from a source namespace using reservoir sampling. The output can be written to a file for later use with `verify-ids --id-file` or `testsync --id-file`.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct sample-ids \
--namespace <SOURCE_NAMESPACE> \
--count 100 \
--output ids.jsonl \
<SOURCE>
```

To sample IDs after a transformation (so the IDs reflect the transformed data):

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct sample-ids \
--namespace <SOURCE_NAMESPACE> \
--count 100 \
--output ids.jsonl \
--transform \
<SOURCE> dsync-transform://transform.yaml
```

| Flag                       | Required | Description                                                                                                                             |
| -------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `--namespace`              | Yes      | The source namespace to sample from.                                                                                                    |
| `--count`                  | No       | Number of IDs to sample. Default: `100`.                                                                                                |
| `--output`                 | No       | Output file path. Defaults to stdout.                                                                                                   |
| `--max-iter-per-partition` | No       | Maximum number of ListData iterations per partition. `0` for unlimited.                                                                 |
| `--transform`              | No       | Set if a transformer is provided after the source connector.                                                                            |
| `--src-data-type`          | No       | Source data type. Inferred if not set.                                                                                                  |
| `--dst-data-type`          | No       | Data type after transform. Inferred if not set.                                                                                         |
| `--id-key`                 | No       | Field name(s) that make up the document ID. Can be specified multiple times for composite keys. Default: `_id` for BSON, `id` for JSON. |

The output format is one extended JSON ID per line, which can be fed directly into `verify-ids --id-file` or `testsync --id-file`.

## verify-ids

The `verify-ids` command fetches specific documents by ID from both the source and destination, optionally transforms the source documents, and compares them. It reports whether each document matches. Both connectors must support `GetByIds`.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify-ids \
--namespace <SOURCE_NAMESPACE> \
--id-file ids.jsonl \
<SOURCE> <DESTINATION>
```

To verify with a transformation applied to the source data before comparison:

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct verify-ids \
--namespace <SOURCE_NAMESPACE> \
--id-file ids.jsonl \
--transform \
<SOURCE> <DESTINATION> dsync-transform://transform.yaml
```

| Flag                  | Required | Description                                                                                     |
| --------------------- | -------- | ----------------------------------------------------------------------------------------------- |
| `--namespace`         | Yes      | The source namespace.                                                                           |
| `--dst-namespace`     | No       | The destination namespace. Defaults to the source namespace or the mapped namespace.            |
| `--id`                | No       | Document ID (string). Can be specified multiple times. For composite keys, use `--id-size`.     |
| `--jsonext-id`        | No       | Document ID in extended JSON format.                                                            |
| `--id-file`           | No       | Path to a file containing extended JSON IDs, one per line. Compatible with `sample-ids` output. |
| `--id-size`           | No       | Number of `--id` entries that form a single composite key. Default: `1`.                        |
| `--transform`         | No       | Set if a transformer is provided after the two connectors.                                      |
| `--src-data-type`     | No       | Source data type. Inferred if not set.                                                          |
| `--dst-data-type`     | No       | Destination data type. Inferred if not set.                                                     |
| `--namespace-mapping` | No       | Namespace mapping from source to destination.                                                   |
| `--mapping-delimiter` | No       | Delimiter for namespace mappings. Default: `:`.                                                 |

At least one of `--id`, `--jsonext-id`, or `--id-file` must be provided.

## Typical Workflow

Use `sample-ids` to collect IDs, then `verify-ids` to spot-check them:

```bash
# 1. Sample IDs from source
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct sample-ids \
--namespace mydb.mycollection \
--count 500 --output ids.jsonl \
<SOURCE>

# 2. Verify those IDs match between source and destination
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify-ids \
--namespace mydb.mycollection \
--id-file ids.jsonl \
<SOURCE> <DESTINATION>
```

For a full verification of all data, use the `verify` command instead:

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify \
--namespace mydb.mycollection \
--parallelism 4 \
<SOURCE> <DESTINATION>
```

## Sync Tester (web-api)

The web-api ships an interactive UI for spot-checking a sync without launching a full job. Start it the same way as the Transform Studio:

```
docker run -e DSYNCT_MODE=simple -p 8080:8080 markadiom/dsynct --host-port 0.0.0.0:8080 web-api --simple-only
```

Open the UI in a browser and navigate to `Tools > Sync Tester`. The tab exposes `Verify IDs` (read-only comparison, equivalent to the `verify-ids` command), `Test Sync`, and some previewing options.

### Connector requirements

Not every action in this tab works for every connector. The constraints are:

* **Sampling IDs** (the `Sample` menu) requires the connector being sampled to also be usable as a source.
* **Verify IDs** requires both the source and destination connectors to be usable as sources and to support `GetByIds`.
* **Test Sync** requires the source connector to support `GetByIds`. The destination connector only needs to be a valid sink.
* Some preview/compare options also require the **Source Type** and **Destination Type** to be set explicitly (not left on `auto`); the UI will surface an error when this applies.

### Test Sync

`Test Sync` runs a short, initial-sync style copy of a specific set of document IDs from source to destination through the regular worker pipeline. It is meant for end-to-end testing of a connector / transformer combination against a small, known set of documents before kicking off a real sync.

To use it:

1. Pick a **Source** and **Destination** connector (and optionally a **Transformer**).
2. Fill in the **Namespace** (and **Destination/Transform Namespace** if it differs).
3. Optionally set **Source Type** and **Destination Type**; leave them on `auto` to infer.
4. Paste one extended JSON ID per line into the **IDs** textarea, or use the **Sample** menu to populate them from the source or destination.
5. Click **Test Sync**.

The pipeline reads each ID from the source, applies the transformer (if enabled), and writes the resulting documents to the destination using the same code path as a real sync. Per-id write results are not surfaced by the pipeline.

When a transformer is in use, namespace mapping is applied **before** the transform. This means the **Destination/Transform Namespace** field is the namespace handed to the transformer, and any namespace referenced inside the transformer's mapping config should be that post-mapping namespace, not the original source namespace.

Use `Verify IDs` in the same tab afterwards to confirm the documents made it to the destination and match the expected payload. Note that the connector restrictions apply, so you may need to just check the destination database directly.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.adiom.io/enterprise/running-dsynct/verification.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
