# Observability

Dsynct exports logs and metrics via OpenTelemetry (OTel). This allows you to monitor migration progress, diagnose performance bottlenecks, and track change stream lag using tools like SigNoz, Grafana, or any OTel-compatible backend.

## Configuration

Enable OpenTelemetry by passing the `--otel` flag and setting the `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable to point at your OTel gRPC collector under the `app` part of the command:

```bash
docker run \
-e 'OTEL_EXPORTER_OTLP_ENDPOINT=http://<COLLECTOR_HOSTNAME>:4317' \
markadiom/dsynct app \
--otel \
<OTHER_COMMANDS_WITH_THEIR_OPTIONS> \
```

For simple mode (no Temporal) the options go before `sync`:

```bash
docker run \
-e 'DSYNCT_MODE=simple' \
-e 'OTEL_EXPORTER_OTLP_ENDPOINT=http://<COLLECTOR_HOSTNAME>:4317' \
markadiom/dsynct \
--otel \
sync \
<OPTIONAL PARAMETERS> \
<SOURCE> <DESTINATION>
```

### OTel Flags

| Flag                     | Description                                             |
| ------------------------ | ------------------------------------------------------- |
| `--otel`                 | Enable exporting logs and metrics to an OTel collector. |
| `--otel-metric-interval` | Interval between metric pushes. Default: `10s`.         |
| `--otel-service-name`    | Service name reported to OTel. Default: `dsynct`.       |

### Logs

When `--otel` is enabled, structured logs (JSON) are emitted both to stderr and to the OTel log collector. The log level can be controlled with `--log-level` (default: `INFO`).

## Metrics

All metrics are emitted under the `dsync-flow` OTel meter. Metrics are labeled with attributes such as `namespace`, `success`, and `worker` to allow filtering and grouping.

### Common Attributes

| Attribute   | Description                                                                                                 |
| ----------- | ----------------------------------------------------------------------------------------------------------- |
| `namespace` | The namespace (collection/table) being processed.                                                           |
| `success`   | `true` if the operation succeeded, `false` if it failed.                                                    |
| `worker`    | Identifies the worker type (e.g. `initial-sync`, `stream-changes`, `writer-0`, `transform-0`, `updates-0`). |
| `index`     | Stream partition index (for change stream gauges).                                                          |

### Initial Sync Metrics

| Metric                 | Type      | Unit      | Description                                                                                            |
| ---------------------- | --------- | --------- | ------------------------------------------------------------------------------------------------------ |
| `dsynct.read`          | Counter   | documents | Total number of documents read from the source.                                                        |
| `dsynct.written`       | Counter   | documents | Total number of documents written to the destination.                                                  |
| `dsynct.list_data`     | Histogram | ms        | Latency of each `ListData` call to the source connector.                                               |
| `dsynct.write_data`    | Histogram | ms        | Latency of each `WriteData` call to the destination connector.                                         |
| `dsynct.get_transform` | Histogram | ms        | Latency of each `GetTransform` call to the transformer. Only emitted when a transformer is configured. |

### Change Stream (CDC) Metrics

| Metric                         | Type      | Unit       | Description                                                                                                  |
| ------------------------------ | --------- | ---------- | ------------------------------------------------------------------------------------------------------------ |
| `dsynct.read`                  | Counter   | events     | Total number of change events read from the source. Shares the same counter as initial sync reads.           |
| `dsynct.written`               | Counter   | events     | Total number of change events written to the destination.                                                    |
| `dsynct.write_updates`         | Histogram | ms         | Latency of each `WriteUpdates` call to the destination connector.                                            |
| `dsynct.get_transform`         | Histogram | ms         | Latency of each `GetTransform` call during change stream processing.                                         |
| `dsynct.stream_read_gauge`     | Gauge     | events     | Running total of change events read for a given stream partition.                                            |
| `dsynct.stream_written_gauge`  | Gauge     | events     | Running total of change events written for a given stream partition.                                         |
| `dsynct.read_ahead_gauge`      | Gauge     |            | The LSN (log sequence number) value reported by the source. Useful for tracking how far ahead the source is. |
| `dsynct.last_event_time`       | Gauge     | ms (epoch) | Timestamp of the last change event processed, in milliseconds since epoch.                                   |
| `dsynct.since_last_event_time` | Gauge     | ms         | Time elapsed since the last change event was processed. Useful for detecting change stream lag.              |

## Dashboards

Pre-configured SigNoz dashboards are available in the [public repository](https://github.com/adiom-data/public/tree/main/kubernetes/system/signoz_dashboards). You can import them by following the [SigNoz import instructions](https://signoz.io/docs/dashboards/import-dashboard/).

### Key Things to Monitor

* **Throughput**: Track `dsynct.read` and `dsynct.written` counters to monitor documents/events per second.
* **Latency**: Use the `dsynct.list_data`, `dsynct.write_data`, and `dsynct.write_updates` histograms to identify slow operations.
* **Change stream lag**: Monitor `dsynct.since_last_event_time` to detect if the destination is falling behind the source. A growing value indicates the CDC pipeline is not keeping up.
* **Read-ahead**: The difference between `dsynct.stream_read_gauge` and `dsynct.stream_written_gauge` shows how many events have been read but not yet written, indicating backpressure.
* **Errors**: Filter by `success=false` to isolate failed operations.
