# Cosmos DB NoSQL to MongoDB

### Prerequisites

1. **Set up Temporal and SigNoz** (or another OTEL collector). You can follow the instructions in [](https://docs.adiom.io/enterprise/running-dsynct "mention")
2. **Obtain Cosmos DB credentials** - URI and Primary Key ("Settings" -> "Keys" in the Azure Portal)
3. **Obtain MongoDB connection string** - for the destination cluster
4. **Enable** [**"All Versions and Deletes"**](https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-modes?tabs=latest-version#all-versions-and-deletes-change-feed-mode-preview) for your Cosmos DB container ("Settings" -> "Features" in the Azure Portal)
   * If you can't turn on the feature, you can run the Cosmos DB NoSQL connector with the `COSMOS_DISABLE_ALL_VERSIONS_AND_DELETES=true` environment variable. In that case Dsync will not be able to replicate delete events from Cosmos DB.

### Data type and ID considerations

Cosmos DB NoSQL uses **JSON** while MongoDB uses **BSON**, so a [data transformation](https://docs.adiom.io/enterprise/running-dsynct/data-transformations) is required.

The Cosmos DB NoSQL [ID format](https://docs.adiom.io/reference/connectors/cosmos-db-nosql#id-format) is composed of the shard key followed by the `id` field. MongoDB uses a single `_id` field. A transform config must map between these ID formats.

See [Transform Data Types](https://docs.adiom.io/enterprise/running-dsynct/data-types) for full details on JSON to BSON mappings.

#### Simple case: shard key is `/id`

When the shard key is `/id`, the Cosmos DB ID contains only the `id` field. The transform maps `id` to `_id`:

```yaml
# transform.yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
```

#### With a shard key prefix

When the shard key is a separate field (e.g. `/region`), the Cosmos DB ID is multi-part (e.g. `["us-east", "123"]`). You need to use `idkeys` to declare the source ID fields and collapse them into a single `_id`:

```yaml
# transform.yaml
defaultmapping: default
idlist: true
mappings:
  - namespace: default
    idkeys: ["region", "id"]
    delete: ["region", "id"]
    add: ["_id"]
    mapid: id[1]
    cel:
      _id: id[1]
```

{% hint style="info" %}
Adjust `idkeys` and the `cel` expressions to match your Cosmos DB container's shard key configuration. See the [multi-part ID examples](https://docs.adiom.io/enterprise/data-types#multi-part-id-examples) for more patterns.
{% endhint %}

### Worker VM setup

```bash
## Set variables
export SIGNOZ=http://<SIGNOZ_HOST>:4317
export TEMPORAL=<TEMPORAL_HOST>:7233
export COSMOS_URI=<...> #e.g. https://cosmos-nosql-west.documents.azure.com:443/
export COSMOS_KEY=<...>
export MONGODB_URI=<...> #e.g. mongodb+srv://user:pass@cluster.mongodb.net

## Create internal network for Docker
docker network create mynet

## Cosmos DB NoSQL Connector (source)
nohup docker run \
--network mynet --name cosmosnosqlconnector \
-e OTEL_EXPORTER_OTLP_ENDPOINT=$SIGNOZ \
markadiom/cosmosnosqlconnector 8089 $COSMOS_URI $COSMOS_KEY 2>&1 > /tmp/java.log &

## Worker
nohup docker run \
--network mynet --name dsyncworker \
-e OTEL_EXPORTER_OTLP_ENDPOINT=$SIGNOZ \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct worker \
--namespace-mapping "cosmos_db.container:mongo_db.collection" \
--concurrent-activities 4 --sync-writer-workers 8 \
--transform \
grpc://cosmosnosqlconnector:8089 --insecure \
$MONGODB_URI \
dsync-transform:///transform.yaml \
temporal --host-port $TEMPORAL \
app --otel 2>&1 > /tmp/dsynct-worker.log &
```

{% hint style="info" %}
If no ID or data type transformation is needed, you can omit the `--transform` flag, the `dsync-transform://` argument, and the `-v` volume mount.
{% endhint %}

### Running the workflow

```bash
## Runner
docker run \
--name dsyncrunner \
-p 8080:8080 \
-e OTEL_EXPORTER_OTLP_ENDPOINT=$SIGNOZ \
markadiom/dsynct run \
--namespace "cosmos_db.container" \
temporal --host-port $TEMPORAL \
app --otel --host-port 0.0.0.0:8080
```
