# Verification

Dsynct provides several commands for verifying that data matches between source and destination. All verification commands run in simple mode (`DSYNCT_MODE=simple`). These features are currently experimental.

## verify

The `verify` command performs a full verification by reading all data from both the source and destination connectors in parallel and comparing them. It supports both initial sync verification and ongoing change stream verification.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify \
--namespace <NAMESPACE> \
<SOURCE> <DESTINATION>
```

With transformations (source-side and/or destination-side):

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct verify \
--namespace <NAMESPACE> \
--src-transform \
<SOURCE> <DESTINATION> dsync-transform://transform.yaml
```

| Flag                    | Required | Description                                                                                                     |
| ----------------------- | -------- | --------------------------------------------------------------------------------------------------------------- |
| `--namespace`           | No       | Source namespace(s). Can be specified multiple times.                                                           |
| `--dst-namespace`       | No       | Destination namespace(s). Defaults to source namespaces (or mapped namespaces).                                 |
| `--namespace-mapping`   | No       | Namespace mapping from source to destination.                                                                   |
| `--parallelism`         | No       | Number of parallel workers. Default: `1`.                                                                       |
| `--src-transform`       | No       | Set if a source-side transformer is provided after the two connectors.                                          |
| `--dst-transform`       | No       | Set if a destination-side transformer is provided (after the source transformer if present).                    |
| `--src-data-type`       | No       | Source data type. Inferred if not set.                                                                          |
| `--dst-data-type`       | No       | Destination data type. Inferred if not set.                                                                     |
| `--transform-data-type` | No       | Intermediate comparison data type. Default: `DATA_TYPE_MONGO_BSON`.                                             |
| `--skip-initial-sync`   | No       | Skip initial sync verification.                                                                                 |
| `--skip-change-stream`  | No       | Skip change stream verification.                                                                                |
| `--latency`             | No       | Only compare documents that have not been updated for this duration during change stream mode. Default: `20s`.  |
| `--report-interval`     | No       | How often to print progress reports. Default: `1s`.                                                             |
| `--report-limit`        | No       | Maximum number of mismatches to report per interval. Default: `5`.                                              |
| `--report-all`          | No       | Report all mismatches instead of limiting.                                                                      |
| `--projection`          | No       | JSON describing which fields to include in comparisons (e.g. `{"field": {"inner_field": true}}`).               |
| `--id-key`              | No       | Field name(s) that make up the document ID. Can be specified multiple times for composite keys. Default: `_id`. |
| `--partition`           | No       | Partition number (0-indexed) for distributed verification. Default: `0`.                                        |
| `--total-partitions`    | No       | Total number of partitions for distributed verification. Default: `1`.                                          |
| `--mapping-delimiter`   | No       | Delimiter for namespace mappings. Default: `:`.                                                                 |

## sample-ids

The `sample-ids` command samples document IDs from a source namespace using reservoir sampling. The output can be written to a file for later use with `verify-ids --id-file` or `testsync --id-file`.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct sample-ids \
--namespace <SOURCE_NAMESPACE> \
--count 100 \
--output ids.jsonl \
<SOURCE>
```

To sample IDs after a transformation (so the IDs reflect the transformed data):

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct sample-ids \
--namespace <SOURCE_NAMESPACE> \
--count 100 \
--output ids.jsonl \
--transform \
<SOURCE> dsync-transform://transform.yaml
```

| Flag                       | Required | Description                                                                                                                             |
| -------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `--namespace`              | Yes      | The source namespace to sample from.                                                                                                    |
| `--count`                  | No       | Number of IDs to sample. Default: `100`.                                                                                                |
| `--output`                 | No       | Output file path. Defaults to stdout.                                                                                                   |
| `--max-iter-per-partition` | No       | Maximum number of ListData iterations per partition. `0` for unlimited.                                                                 |
| `--transform`              | No       | Set if a transformer is provided after the source connector.                                                                            |
| `--src-data-type`          | No       | Source data type. Inferred if not set.                                                                                                  |
| `--dst-data-type`          | No       | Data type after transform. Inferred if not set.                                                                                         |
| `--id-key`                 | No       | Field name(s) that make up the document ID. Can be specified multiple times for composite keys. Default: `_id` for BSON, `id` for JSON. |

The output format is one extended JSON ID per line, which can be fed directly into `verify-ids --id-file` or `testsync --id-file`.

## verify-ids

The `verify-ids` command fetches specific documents by ID from both the source and destination, optionally transforms the source documents, and compares them. It reports whether each document matches. Both connectors must support `GetByIds`.

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify-ids \
--namespace <SOURCE_NAMESPACE> \
--id-file ids.jsonl \
<SOURCE> <DESTINATION>
```

To verify with a transformation applied to the source data before comparison:

```bash
docker run -e 'DSYNCT_MODE=simple' \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct verify-ids \
--namespace <SOURCE_NAMESPACE> \
--id-file ids.jsonl \
--transform \
<SOURCE> <DESTINATION> dsync-transform://transform.yaml
```

| Flag                  | Required | Description                                                                                     |
| --------------------- | -------- | ----------------------------------------------------------------------------------------------- |
| `--namespace`         | Yes      | The source namespace.                                                                           |
| `--dst-namespace`     | No       | The destination namespace. Defaults to the source namespace or the mapped namespace.            |
| `--id`                | No       | Document ID (string). Can be specified multiple times. For composite keys, use `--id-size`.     |
| `--jsonext-id`        | No       | Document ID in extended JSON format.                                                            |
| `--id-file`           | No       | Path to a file containing extended JSON IDs, one per line. Compatible with `sample-ids` output. |
| `--id-size`           | No       | Number of `--id` entries that form a single composite key. Default: `1`.                        |
| `--transform`         | No       | Set if a transformer is provided after the two connectors.                                      |
| `--src-data-type`     | No       | Source data type. Inferred if not set.                                                          |
| `--dst-data-type`     | No       | Destination data type. Inferred if not set.                                                     |
| `--namespace-mapping` | No       | Namespace mapping from source to destination.                                                   |
| `--mapping-delimiter` | No       | Delimiter for namespace mappings. Default: `:`.                                                 |

At least one of `--id`, `--jsonext-id`, or `--id-file` must be provided.

## Typical Workflow

Use `sample-ids` to collect IDs, then `verify-ids` to spot-check them:

```bash
# 1. Sample IDs from source
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct sample-ids \
--namespace mydb.mycollection \
--count 500 --output ids.jsonl \
<SOURCE>

# 2. Verify those IDs match between source and destination
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify-ids \
--namespace mydb.mycollection \
--id-file ids.jsonl \
<SOURCE> <DESTINATION>
```

For a full verification of all data, use the `verify` command instead:

```bash
docker run -e 'DSYNCT_MODE=simple' \
markadiom/dsynct verify \
--namespace mydb.mycollection \
--parallelism 4 \
<SOURCE> <DESTINATION>
```
