# From Cosmos DB NoSQL to MongoDB API

### Prerequisites

1. **Obtain Cosmos DB credentials** - URI and Primary Key ("Settings" -> "Keys" in the Azure Portal)
2. **Obtain MongoDB connection string** - for the destination cluster
3. **Enable** [**"All Versions and Deletes"**](https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-modes?tabs=latest-version#all-versions-and-deletes-change-feed-mode-preview) for your Cosmos DB container ("Settings" -> "Features" in the Azure Portal)
   * If you can't turn on the feature, you can run the Cosmos DB NoSQL connector with the `COSMOS_DISABLE_ALL_VERSIONS_AND_DELETES=true` environment variable. In that case Dsync will not be able to replicate delete events from Cosmos DB.

### Data types and ID considerations

Cosmos DB NoSQL uses **JSON** while MongoDB uses **BSON**, so a [data transformation](https://docs.adiom.io/enterprise/running-dsynct/data-transformations) is required. Commonly, as part of that transformation, you'd want to convert some JSON types into BSON types, such as strings into Dates for timestamps. Additionally, you may want to transfer some internal Cosmos NoSQL fields such as `_ts` that is used for TTL.&#x20;

For the Open Source dsync, you can build your own custom transformer following this [example in GitHub](https://github.com/adiom-data/dsync/blob/main/transform/identity.go). You can implement it in the language of your choice (e.g. Java or Python) as long as it supports gRPC and implements the required [Transform interface](https://docs.adiom.io/implementation-details/architecture#transform-interface). &#x20;

{% hint style="info" %}
The [Enterprise](https://docs.adiom.io/enterprise) version of Dsync has a CEL-based transformer that you can try [here](https://github.com/adiom-data/public/tree/main/dsync-transform). In the instructions below we will be using its format as an example given how intuitive it is.
{% endhint %}

The Cosmos DB NoSQL [ID format](https://docs.adiom.io/reference/connectors/cosmos-db-nosql#id-format) is composed of the shard key followed by the `id` field. MongoDB uses a single `_id` field. A transform config must map between these ID formats.

See [Transform Data Types](https://docs.adiom.io/enterprise/running-dsynct/data-types) for full details on JSON to BSON mappings.

#### Simple case: shard key is `/id`

When the shard key is `/id`, the Cosmos DB ID contains only the `id` field. The transform maps `id` to `_id`:

```yaml
# transform.yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
```

#### With a shard key prefix

When the shard key is a separate field (e.g. `/region`), the Cosmos DB ID is multi-part (e.g. `["us-east", "123"]`). You need to use `idkeys` to declare the source ID fields and collapse them into a single `_id`:

```yaml
# transform.yaml
defaultmapping: default
idlist: true
mappings:
  - namespace: default
    idkeys: ["region", "id"]
    delete: ["region", "id"]
    add: ["_id"]
    mapid: id[1]
    cel:
      _id: id[1]
```

{% hint style="info" %}
Adjust `idkeys` and the `cel` expressions to match your Cosmos DB container's shard key configuration. See the [multi-part ID examples](https://docs.adiom.io/enterprise/running-dsynct/data-types#multi-part-id-examples) for more patterns.
{% endhint %}

***

### Step 1: Download dsync

{% hint style="info" %}
Working on a large-scale production environment? Use our horizontally scalable [Enterprise offering.](https://docs.adiom.io/enterprise/scalable-deployment)
{% endhint %}

Use Docker (`markadiom/dsync`) or download the latest release from the [GitHub Releases](https://github.com/adiom-data/dsync/releases/latest) page. Note that on Mac devices you may need to configure a security exception to execute the binary by following [these steps](https://support.apple.com/en-ca/guide/mac-help/mh40616/mac).

Alternatively, you can build dsync from the source code.

```bash
git clone https://github.com/adiom-data/dsync.git
cd dsync
go build
```

{% hint style="info" %}
You can use Homebrew to easily install Dsync on your Mac:

```
brew install adiom-data/homebrew-tap/dsync
```

{% endhint %}

{% hint style="info" %}
We recommend using Docker for this tutotial
{% endhint %}

#### CosmosDB NoSQL Connector

The connector for CosmosDB NoSQL runs as a separate process because it uses the optimized Java SDK. You can run it as a Docker container.

If you'd rather build it from the source and run as a regular process, you can check out the git repository, `cd` into the `java` directory and run `mvn clean install`. You will need Java JDK 21 or newer. This will create a jar in the `java/target` directory and for convenience you can set up an alias like so (replacing the path/to/dsync with the appropriate file):

```
alias cosmos-connector='OTEL_SDK_DISABLED=true java -jar /path/to/dsync/java/target/cosmos-connector-1-jar-with-dependencies.jar'
```

You can look at the README in the `java` directory for the most up to date set up instructions.

***

### Step 2: Prepare the destination MongoDB instance

{% hint style="warning" %}
If you already have the desired destination MongoDB instance up and running, you can skip this step.
{% endhint %}

1. Install [MongoDB](https://www.mongodb.com/docs/manual/administration/install-community/)
2. Start a local MongoDB instance:

```bash
mkdir ~/temp
cd ~/temp
mkdir data_d
mongod --dbpath data_d --logpath mongod_d.log --fork --port 27017
```

{% hint style="info" %}
For faster performance, we recommend creating any required secondary indexes *after* the initial data copy has completed.
{% endhint %}

Export MongoDB URI as a shell variable:

{% code overflow="wrap" %}

```bash
export MONGODB_URI=<...> #e.g. mongodb+srv://user:pass@cluster.mongodb.net
```

{% endcode %}

***

### Step 3: Start the Cosmos NoSQL connector

You will need to set the URL and the KEY env variables to the correct values corresponding to your Cosmos DB account. See [here](https://learn.microsoft.com/en-us/answers/questions/1056745/where-to-find-cosmos-db-endpoint-and-key) for where to find them.

```bash
export URL="..."
export KEY="..."
```

Then run `cosmos-connector 8089 $URL $KEY &` in the background. This starts a grpc service (running without tls) that will talk to Cosmos DB NoSQL.

If you're building dsync from the source, follow the instructions [here](https://github.com/adiom-data/dsync/blob/main/java/README.md) to build the connector.

For Cloud Marketplace images and Docker:

```bash
sudo docker network create mynet

nohup sudo docker run \
--network mynet --name cosmosnosqlconnector \
-e OTEL_SDK_DISABLED=true \
markadiom/cosmosnosqlconnector 8089 $URL $KEY > /tmp/java.log 2>&1 &
```

***

### Step 4: Start the transformer

You can start your transformer gRPC server listening on a port like 8085.&#x20;

When using the Enterprise CEL-based transformer, you will need to prepare the config file as described in [#data-types-and-id-considerations](#data-types-and-id-considerations "mention") , save it as `config.yml`, and run the process as a Docker container:

{% code overflow="wrap" %}

```bash
nohup sudo docker run \
--network mynet --name dsync-transform \
-v "./config.yml:/config.yml" \
-e "DSYNCT_MODE=simple" \
markadiom/dsynct --host-port=0.0.0.0:8085 transformer > /tmp/transform.log 2>&1 &
```

{% endcode %}

***

### Step 5: Start dsync

Run `dsync --namespace <DB>.<CONTAINER> $COSMOS_NOSQL_GRPC_URI --insecure $MONGODB_URI $TRANSFORMER_GRPC_URI --insecure`.  Substitute `GRPC_URI` with corresponding addresses for the connector and the transformer in the format `grpc://localhost:port`

Replace `<DB>.<CONTAINER>` with the desired CosmosDB NoSQL Database and Container names. We use the `--insecure` since we are not using TLS for our connection to the Cosmos DB NoSQL connector.

You can migrate multiple different containers at the same time by specifying multiple mappings in the `--namespace` param:

`dsync --namespace "<DB1>.<CONTAINER1>,<DB2>.<CONTAINER2>"`

Full command for Docker:

```bash
sudo docker run \
--network mynet --name dsync \
-p 8080:8080 \
markadiom/dsync \
--web-host 0.0.0.0 \
--namespace "<DB>.<CONTAINER>" \
grpc://cosmosnosqlconnector:8089 --insecure \
$MONGODB_URI \
grpc://dsync-transform:8085 --insecure
```

The web progress will be available on [localhost:8080](https://localhost:8080).

### Limitations

For Cosmos DB NoSQL sources, the Open Source version of Dsync only supports CDC for a single namespace . For multiple namespaces, you can either do the initial sync only (`--mode InitialSync`), run multiple Dsync processes (one for each namespace), or use the [Enterprise version](https://docs.adiom.io/enterprise/running-dsynct/cosmos-db-nosql-to-mongo).
