From DynamoDB to MongoDB

Near-zero downtime migration from DynamoDB to MongoDB with dsync

Prerequisites

1) DynamoDB instance with change streams enabled (if CDC is needed). Make sure you have at least "New Image" enabled for the stream.

2) MongoDB cluster. Any sharded databases and collections need to be pre-created.

3) AWS credentials with proper permissions for the source

4) Installed aws-cli or AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Additional Considerations

Indexes

For faster migration performance, we recommend to create secondary indexes on destination MongoDB after the initial sync is done. All modern MongoDB versions support building indexes in the background, and they can be created during the CDC / catch up phase.

Networking

The host running dsync should have network access to both DynamoDB and MongoDB instances.

Provisioning additional capacity on the source

The initial data copy stage of the migration is equivalent to a parallel table scan in DynamoDB and it will consume additional read units. When your DynamoDB source is already serving live production traffic and the table(s) are configured with provisioned capacity, we recommended to temporarily increase read capacity to avoid impacting production traffic. The exact capacity increase varies on a case-by-case basis, but as a rule of thumb we recommend to add at least 10,000 RCU.

The consumed capacity can be regulated using the --load-level [Low|Medium|High] dsync command-line parameter or via more granular --parallel-copiers N option.

Example

Each copier or reader thread scans data in 1MB pages (256 RCU each), and can process 5 pages per second.

With record size of 2KB, we get 500 records per page, or 2500/s per copier.

With the default 4 copiers, we should be able to achieve 10,000 records per second consuming 5,120 RCU.

Note that DynamoDB uses adaptive scaling where maximum RCU per adaptive partition is 3,000. When needed (and possible), adaptive partitions are automatically split based on the partition key. This splitting can lead to temporary throttling exceptions. Dsync does its best to avoid them, but in case they happen, the associated initial sync tasks are safe to retry.

Step 1: Download dsync

Working on a large-scale production environment? Use our horizontally scalable Enterprise offering.

Use Docker (markadiom/dsync) or download the latest release from the GitHub Releases page. Note that on Mac devices you may need to configure a security exception to execute the binary by following these steps.

You can also build dsync directly from the source code using go build.

If you're using a Cloud Provider marketplace image (e.g. from Azure Marketplace), then the binaries have already been preinstalled, and you just need to ssh into your provisioned instance.

If you want to access the Web UI progress feature (default port 8080) you can port forward e.g.

ssh -L 8080:localhost:8080 [email protected]

You can use Homebrew to easily install Dsync on your Mac:

brew install adiom-data/homebrew-tap/dsync

Step 2: Set up environment variables

Ensure you set your AWS credentials properly, such as by setting the AWS environment variables:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."

Alternatively, you can use the aws configure sso (if you're doing it for the first time or using a VM from the marketplace) and aws sso login shell commands to securely login into the AWS account for aws-cli. Test to see if you can see your dynamodb tables with aws dynamodb list-tables.

(Optional) Step 3: Start the transformer

When data transformations are required, you can connect a transformer to Dsync via the gRPC extension interface. You can write your own in the language of your choice, or use our YAML-based declarative transformer (available only for Enterprise customers).

When running the transformer in Docker, make sure the container is on the same network as Dsync container: the --network option to the "docker run" command.

Step 4: Start dsync

Run dsync --namespace <TABLENAME>:<DB>.<COL> dynamodb $MONGODB_URI. Replace <TABLENAME> with the dynamodb table name. Replace $MONGODB_URI with the desired MongoDB URI.

When transformer is required, use dsync --namespace <TABLENAME>:<DB>.<COL> dynamodb $MONGODB_URI grpc://localhost:8085 --insecure

We use the --insecure since we are not using TLS for our connection to the transformer service and we assume it's running on the same host on port 8085.

You can migrate multiple different tables at the same time by specifying multiple mappings in the --namespace param:

dsync --namespace "<TABLE1>:<DB>.<COL1>,<TABLE2>:<DB>.<COL2>" dynamodb $MONGODB_URI grpc://localhost:8089 --insecure

For Cloud Marketplace images and Docker:

sudo docker run \
--network mynet --name dsync \
-p 8080:8080 \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_REGION=$AWS_REGION \
markadiom/dsync \
--web-host 0.0.0.0 \
--namespace "<TABLE1>:<DB>.<COL1>,<TABLE2>:<DB>.<COL2>" \
dynamodb \
$MONGODB_URI \
grpc://transformer:8085 --insecure

Step 5: Post-migration configuration

Indexes

Create/validate necessary indexes on MongoDB.

PreviousFrom Cosmos DB to MongoDB NextFrom self-managed MongoDB to Azure DocumentDB

Last updated 23 days ago

hashtagPrerequisites

hashtagAdditional Considerations

hashtagIndexes

hashtagNetworking

hashtagProvisioning additional capacity on the source

hashtagStep 1: Download dsync

hashtagStep 2: Set up environment variables

hashtag(Optional) Step 3: Start the transformer

hashtagStep 4: Start dsync

hashtagStep 5: Post-migration configuration

hashtagIndexes

Prerequisites

Additional Considerations

Indexes

Networking

Provisioning additional capacity on the source

Step 1: Download dsync

Step 2: Set up environment variables

(Optional) Step 3: Start the transformer

Step 4: Start dsync

Step 5: Post-migration configuration

Indexes