From DynamoDB to Cosmos DB NoSQL

Near-zero downtime migration from DynamoDB to Cosmos DB NoSQL with dsync

Prerequisites

1) DynamoDB instance with change streams enabled. Make sure you have at least "New Image" enabled for the stream.

2) Cosmos DB NoSQL Account with the pre-created destination database(s) and container(s)

3) AWS credentials with proper permissions for the source

4) CosmosDB NoSQL Account Url and Read-Write Key (can be obtained using the Azure Portal)

5) Installed aws-cli

Step 1: Download dsync

Download the latest release from the GitHub Releases page. Note that on Mac devices you may need to configure a security exception to execute the binary by following these steps.

You can also build dsync directly from the source code using go build.

If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), then the binaries have already been preinstalled, and you just need to ssh into your provisioned instance.

If you want to access the Web UI progress feature (default port 8080) you can port forward e.g.

ssh -L 8080:localhost:8080 myuser@my.instance.ip

CosmosDB NoSQL Sink Binary

The connector for CosmosDB NoSQL runs as a separate binary, so you will need to set this up as well.

If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), this will be the preinstalled binary cosmos-sink.

If not, you can check out the git repository, cd into the java directory and run mvn clean install. You will need Java JDK 21 or newer. This will create a jar in the java/target directory and for convenience you can set up an alias like so (replacing the path/to/dsync with the appropriate file):

alias cosmos-sink='java -jar /path/to/dsync/java/target/cosmos-sink-1-jar-with-dependencies.jar'

You can look at the README in the java directory for the most up to date set up instructions.

Step 2: Set up environment variables

  1. Look up the target CosmosDB NoSQL Account url and key and export the details into the env variables $URL and $KEY respectively. Ensure you create the database and container you want to move files into.

  2. Ensure you set your AWS credentials properly, such as by setting the AWS environment variables:

    export AWS_ACCESS_KEY_ID="..."
    export AWS_SECRET_ACCESS_KEY="..."
    export AWS_SESSION_TOKEN="..."

    Alternatively, you can use the aws configure sso (if you're doing it for the first time or using a VM from the marketplace) and aws sso loginshell commands to securely login into the AWS account for aws-cli. Test to see if you can see your dynamodb tables with aws dynamodb list-tables.

Step 3: Start the Cosmos NoSQL connector

Run cosmos-sink 8089 $URL $KEY & in the background. This starts a grpc service (running without tls) that will write to the specified CosmosDB NoSQL destination.

If you're building dsync from the source, follow the instructions here to build the connector.

Step 4: Start dsync

Run dsync --namespace <TABLENAME>:<DB>.<CONTAINER> dynamodb grpc://localhost:8089 --insecure. Replace <TABLENAME> with the dynamodb table name. Replace <DB>.<CONTAINER> with the desired CosmosDB NoSQL Database and Container names. We use the --insecure since we are not using tls for our connection to the Cosmos DB NoSQL connector.

Limitations

  1. This data flow is currently not resumable.

  2. Embedded validation checks may not function for this data flow.

Last updated