From DynamoDB to Cosmos DB NoSQL
Near-zero downtime migration from DynamoDB to Cosmos DB NoSQL with dsync
Prerequisites
1) DynamoDB instance with change streams enabled. Make sure you have at least "New Image" enabled for the stream.
2) Cosmos DB NoSQL Account with the pre-created destination database(s) and container(s)
3) AWS credentials with proper permissions for the source
4) CosmosDB NoSQL Account Url and Read-Write Key (can be obtained using the Azure Portal)
5) Installed aws-cli
Step 1: Download dsync
Download the latest release from the GitHub Releases page. Note that on Mac devices you may need to configure a security exception to execute the binary by following these steps.
You can also build dsync directly from the source code using go build
.
If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), then the binaries have already been preinstalled, and you just need to ssh into your provisioned instance.
If you want to access the Web UI progress feature (default port 8080) you can port forward e.g.
CosmosDB NoSQL Sink Binary
The connector for CosmosDB NoSQL runs as a separate binary, so you will need to set this up as well.
If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), this will be the preinstalled binary cosmos-sink
.
If not, you can check out the git repository, cd
into the java
directory and run mvn clean install
. You will need Java JDK 21 or newer. This will create a jar in the java/target
directory and for convenience you can set up an alias like so (replacing the path/to/dsync with the appropriate file):
You can look at the README in the java
directory for the most up to date set up instructions.
Step 2: Set up environment variables
Look up the target CosmosDB NoSQL Account url and key and export the details into the env variables $URL and $KEY respectively. Ensure you create the database and container you want to move files into.
Ensure you set your AWS credentials properly, such as by setting the AWS environment variables:
Alternatively, you can use the
aws configure sso
(if you're doing it for the first time or using a VM from the marketplace) andaws sso login
shell commands to securely login into the AWS account for aws-cli. Test to see if you can see your dynamodb tables withaws dynamodb list-tables
.
Step 3: Start the Cosmos NoSQL connector
Run cosmos-sink 8089 $URL $KEY &
in the background. This starts a grpc service (running without tls) that will write to the specified CosmosDB NoSQL destination.
If you're building dsync from the source, follow the instructions here to build the connector.
Step 4: Start dsync
Run dsync --namespace <TABLENAME>:<DB>.<CONTAINER> dynamodb grpc://localhost:8089 --insecure
. Replace <TABLENAME>
with the dynamodb table name. Replace <DB>.<CONTAINER>
with the desired CosmosDB NoSQL Database and Container names. We use the --insecure
since we are not using tls for our connection to the Cosmos DB NoSQL connector.
Limitations
This data flow is currently not resumable.
Embedded validation checks may not function for this data flow.
Last updated