From Cosmos DB NoSQL to MongoDB API

Near-zero downtime migration from Cosmos DB NoSQL to MongoDB API

Prerequisites

  1. Obtain Cosmos DB credentials - URI and Primary Key ("Settings" -> "Keys" in the Azure Portal)

  2. Obtain MongoDB connection string - for the destination cluster

  3. Enable "All Versions and Deletes"arrow-up-right for your Cosmos DB container ("Settings" -> "Features" in the Azure Portal)

Data types and ID considerations

Cosmos DB NoSQL uses JSON while MongoDB uses BSON, so a data transformation is required. Commonly, as part of that transformation, you'd want to convert some JSON types into BSON types, such as strings into Dates for timestamps. Additionally, you may want to transfer some internal Cosmos NoSQL fields such as _ts that is used for TTL.

For the Open Source dsync, you can build your own custom transformer following this example in GitHubarrow-up-right. You can implement it in the language of your choice (e.g. Java or Python) as long as it supports gRPC and implements the required Transform interfacearrow-up-right.

circle-info

The Enterprise version of Dsync has a CEL-based transformer that you can try herearrow-up-right. In the instructions below we will be using its format as an example given how intuitive it is.

The Cosmos DB NoSQL ID format is composed of the shard key followed by the id field. MongoDB uses a single _id field. A transform config must map between these ID formats.

See Transform Data Types for full details on JSON to BSON mappings.

Simple case: shard key is /id

When the shard key is /id, the Cosmos DB ID contains only the id field. The transform maps id to _id:

# transform.yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id

With a shard key prefix

When the shard key is a separate field (e.g. /region), the Cosmos DB ID is multi-part (e.g. ["us-east", "123"]). You need to use idkeys to declare the source ID fields and collapse them into a single _id:

circle-info

Adjust idkeys and the cel expressions to match your Cosmos DB container's shard key configuration. See the multi-part ID examples for more patterns.


Step 1: Download dsync

circle-info

Working on a large-scale production environment? Use our horizontally scalable Enterprise offering.

Use Docker (markadiom/dsync) or download the latest release from the GitHub Releasesarrow-up-right page. Note that on Mac devices you may need to configure a security exception to execute the binary by following these stepsarrow-up-right.

Alternatively, you can build dsync from the source code.

circle-info

You can use Homebrew to easily install Dsync on your Mac:

circle-info

We recommend using Docker for this tutotial

CosmosDB NoSQL Connector

The connector for CosmosDB NoSQL runs as a separate process because it uses the optimized Java SDK. You can run it as a Docker container.

If you'd rather build it from the source and run as a regular process, you can check out the git repository, cd into the java directory and run mvn clean install. You will need Java JDK 21 or newer. This will create a jar in the java/target directory and for convenience you can set up an alias like so (replacing the path/to/dsync with the appropriate file):

You can look at the README in the java directory for the most up to date set up instructions.


Step 2: Prepare the destination MongoDB instance

circle-exclamation
  1. Start a local MongoDB instance:

circle-info

For faster performance, we recommend creating any required secondary indexes after the initial data copy has completed.

Export MongoDB URI as a shell variable:


Step 3: Start the Cosmos NoSQL connector

You will need to set the URL and the KEY env variables to the correct values corresponding to your Cosmos DB account. See herearrow-up-right for where to find them.

Then run cosmos-connector 8089 $URL $KEY & in the background. This starts a grpc service (running without tls) that will talk to Cosmos DB NoSQL.

If you're building dsync from the source, follow the instructions herearrow-up-right to build the connector.

For Cloud Marketplace images and Docker:


Step 4: Start the transformer

You can start your transformer gRPC server listening on a port like 8085.

When using the Enterprise CEL-based transformer, you will need to prepare the config file as described in Data types and ID considerations , save it as config.yml, and run the process as a Docker container:


Step 5: Start dsync

Run dsync --namespace <DB>.<CONTAINER> $COSMOS_NOSQL_GRPC_URI --insecure $MONGODB_URI $TRANSFORMER_GRPC_URI --insecure. Substitute GRPC_URI with corresponding addresses for the connector and the transformer in the format grpc://localhost:port

Replace <DB>.<CONTAINER> with the desired CosmosDB NoSQL Database and Container names. We use the --insecure since we are not using TLS for our connection to the Cosmos DB NoSQL connector.

You can migrate multiple different containers at the same time by specifying multiple mappings in the --namespace param:

dsync --namespace "<DB1>.<CONTAINER1>,<DB2>.<CONTAINER2>"

Full command for Docker:

The web progress will be available on localhost:8080arrow-up-right.

Limitations

For Cosmos DB NoSQL sources, the Open Source version of Dsync only supports CDC for a single namespace . For multiple namespaces, you can either do the initial sync only (--mode InitialSync), run multiple Dsync processes (one for each namespace), or use the Enterprise version.

Last updated