From DynamoDB to Cosmos DB NoSQL
Near-zero downtime migration from DynamoDB to Cosmos DB NoSQL with dsync
Prerequisites
1) DynamoDB instance with change streams enabled. Make sure you have at least "New Image" enabled for the stream.
2) Cosmos DB NoSQL Account with the pre-created destination database(s) and container(s)
3) AWS credentials with proper permissions for the source
4) CosmosDB NoSQL Account Url and Read-Write Key (can be obtained using the Azure Portal)
5) Installed aws-cli
Additional Considerations
Indexes
For faster migration performance, we recommend to disable indexes on Cosmos DB NoSQL tables that will be used as a destination for the migration.
Networking
The host running dsync should have network access to both DynamoDB and Cosmos DB for NoSQL instances. When both DynamoDB and Cosmos DB are only accessible from within a VPC/VNET, you need to establish connectivity between the cloud providers by following the instructions here.
Provisioning additional capacity on the source
The initial data copy stage of the migration is equivalent to a parallel table scan in DynamoDB and it will consume additional read units. When your DynamoDB source is already serving live production traffic and the table(s) are configured with provisioned capacity, we recommended to temporarily increase read capacity to avoid impacting production traffic. The exact capacity increase varies on a case-by-case basis, but as a rule of thumb we recommend to add at least 10,000 RCU.
Step 1: Download dsync
Download the latest release from the GitHub Releases page. Note that on Mac devices you may need to configure a security exception to execute the binary by following these steps.
You can also build dsync directly from the source code using go build
.
If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), then the binaries have already been preinstalled, and you just need to ssh into your provisioned instance.
If you want to access the Web UI progress feature (default port 8080) you can port forward e.g.
CosmosDB NoSQL Sink Binary
The connector for CosmosDB NoSQL runs as a separate binary, so you will need to set this up as well.
If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), this will be the preinstalled binary cosmos-sink
.
If not, you can check out the git repository, cd
into the java
directory and run mvn clean install
. You will need Java JDK 21 or newer. This will create a jar in the java/target
directory and for convenience you can set up an alias like so (replacing the path/to/dsync with the appropriate file):
You can look at the README in the java
directory for the most up to date set up instructions.
Step 2: Set up environment variables
Look up the target CosmosDB NoSQL Account url and key and export the details into the env variables $URL and $KEY respectively. Ensure you create the database and container you want to move files into.
Ensure you set your AWS credentials properly, such as by setting the AWS environment variables:
Alternatively, you can use the
aws configure sso
(if you're doing it for the first time or using a VM from the marketplace) andaws sso login
shell commands to securely login into the AWS account for aws-cli. Test to see if you can see your dynamodb tables withaws dynamodb list-tables
.
Step 3: Start the Cosmos NoSQL connector
Run cosmos-sink 8089 $URL $KEY &
in the background. This starts a grpc service (running without tls) that will write to the specified CosmosDB NoSQL destination.
If you're building dsync from the source, follow the instructions here to build the connector.
Step 4: Start dsync
Run dsync --namespace <TABLENAME>:<DB>.<CONTAINER> dynamodb grpc://localhost:8089 --insecure
. Replace <TABLENAME>
with the dynamodb table name. Replace <DB>.<CONTAINER>
with the desired CosmosDB NoSQL Database and Container names. We use the --insecure
since we are not using tls for our connection to the Cosmos DB NoSQL connector.
Step 5: Post-migration configuration
Indexes
Create/validate necessary indexes on Cosmos DB for NoSQL.
Global tables
If you had global tables configured in DynamoDB, you may want to configure global distribution for those tables in Cosmos DB for NoSQL.
Reporting
Follow these instructions to configure Analytics and BI in Azure for data stored in Cosmos DB for NoSQL.
Monitoring
Familiarize yourself with Cosmos DB monitoring dashboards and metrics. You can configure the necessary alerts by following the instructions here.
Backups
Review and adjust configuration for Cosmos DB backups.
Limitations
The DynamoDB to Cosmos DB for NoSQL data flow is currently not resumable.
Embedded validation checks may not function for this data flow.
Last updated