Data Transformations

Overview

Enterprise Dsync features a prebuilt YAML-based transformer that allows to add, remove and modify data elements using mappings and Common Expression Language (CELarrow-up-right).

Transformations are applied on-the-fly during both initial sync and CDC. The transformer itself can be run as a standalone process that Dsynct connects to over gRPC. The detailed instructions can be found in our public repository.arrow-up-right

For convenience, Dsynct workers can run the transformer as an embedded process by providing the --transform option along with the path to the config file as the third argument:

dsynct worker <OTHER_OPTIONS> --transform $SOURCE $DESTINATION dsync-transform://transform.yaml

When using Docker to run Dsynct, the transformer config file needs to be mounted to the container. For example:

docker run \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct worker <OTHER_OPTIONS> \
--transform \
$SOURCE $DESTINATION dsync-transform://transform.yaml

Writing Config Files

Dsync Transform runs off a YAML configuration file where the mappings are specified. Each source document is converted into an internal format, subjected to the mappings, and then converted back to the output document type.

Duplicate mappings are allowed and will fan out. Use the filter feature to avoid fanout if necessary. Note that mapping IDs should also require mapping the ID keys. ID mappings only have access to the original ID so that they can be applied to deletes which do not have access to the full data. If you need to convert ID types between systems (e.g., string to BSON ObjectID), see the Transform Data Types page for detailed guidance and examples.

Example Config File

mappings:
  - namespace: srcnamespace
    mapnamespace: dstnamespace
    map:
      should_be_int32: int32
    cel:
      name: self + "!"
      newfield: '"abcd"'
      should_be_int32: self + 5
    add: ["newfield"]
    delete: ["existingfield"]

Each mapping must specify the source namespace. If the destination namespace is different, use mapnamespace. The key fields work as follows:

  • cel -- Specify a CEL expression for mapping each field. The variable self refers to the current value of that field. Note that CEL only supports a limited set of types (e.g., 64-bit integers but not 32-bit integers).

  • map -- Apply a special type mapping after the cel expression. Use this when you need a type that CEL cannot represent directly, such as int32 for a 32-bit integer.

  • add -- List fields that should be created in the output even if they were not present in the source document.

  • delete -- List fields that should be removed from the output.

Configuration Reference

Top-Level Options

Option
Type
Default
Description

wild

string

*

When specifying a path, this matches anything.

delimiter

string

.

When specifying a path, this is the delimiter.

env

map[string, any]

Variables available under the env variable in CEL expression mappings.

unwrapbson

boolean

false

If true, will automatically convert various BSON types to a more native type (e.g., ObjectIDs to strings).

filtererrors

boolean

false

If true, will not fail on errors during conversion and instead skip and log a warning. Errors encountered when retrieving the original ID will still error.

defaultmapping

string

Name (namespace) of the mapping from mappings to use as a fallback.

namespacemapper

CEL string

Default expression to automatically map all namespaces. Has env and self (namespace) available.

idlist

boolean

false

If true, the id variable will always be a list. When false, the id variable contains the first id value if it is the only id value.

mappings

list[mapping]

List of mapping definitions (see below).

Mapping Options

Each entry in mappings supports the following fields:

Option
Type
Description

namespace

string

Namespace this mapping applies to.

mapnamespace

string

New namespace name for the output.

mapid

CEL string

Expression to map the id for updates. Only has the original id field and env available.

filter

CEL string

Expression that returns a boolean; if true, the document will be retained. Only has the original id field and env available.

idkeys

list[string]

Describes the original names of each part of the id.

finalidkeys

list[string]

Describes the names of each part of the id after the mapping.

add

list[string]

Paths that will be added in the mapping if the parent exists.

delete

list[string]

Paths that will be deleted if they exist.

cel

map[string, CEL string]

For each defined path, specify a CEL expression to perform a mapping. Has env, id, doc, parent, and self available as variables.

map

map[string, string]

For each defined path, specify a special mapping function to apply. Applies after cel.

self

CEL string

An expression that serves as a mapping for the whole document.

Available Mappings

For use with the map configuration or inside a cel configuration expression. In certain cases, it is advisable to use map to force a type that CEL cannot represent directly.

Type Conversions

Mapping
Description

int32

Converts to an int32. Should use in map only.

float

Converts to a float32. Should use in map only.

json_number

Converts to a JSON Number.

json_decode

Decodes a JSON string or bytes into an object.

json_encode

Encodes an object as a JSON string.

BSON Conversions

Mapping
Description

bson_decimal128

Converts a string to a BSON Decimal128.

bson_decimal128_string

Converts a BSON Decimal128 to a string.

bson_object_id

Converts a string to a BSON ObjectID. Should use in map only.

bson_uuid

Converts a UUID string to a BSON UUID.

bson_object_id_string

Converts BSON ObjectID to a string.

bson_uuid_string

Converts BSON UUID to a string.

Hash Functions

Mapping
Description

md5

Applies the MD5 hash to a string or bytes, returning bytes.

sha1

Applies the SHA-1 hash to a string or bytes, returning bytes.

sha256

Applies the SHA-256 hash to a string or bytes, returning bytes.

Byte Mappings

Mapping
Description

be_to_int32

Converts bytes to an int assuming big endian format. Use in map to get int32.

be_to_int64

Converts bytes to an int64 assuming big endian format.

to_be_int32

Converts data into bytes representing an int32 in big endian format.

to_be_int64

Converts data into bytes representing an int64 in big endian format.

reverse_bytes

Reverses a byte array.

Available Functions

All the available mappings above are usable as unary functions in CEL expressions. The following additional functions are also available:

Function
Description

now_millis()

Current time in milliseconds.

now_nanos()

Current time in nanoseconds (resolution may be limited by your machine).

uuid_v4_bytes()

Generate a random UUID as bytes.

uuid_v4_string()

Generate a random UUID as a string.

uuid_v3_bytes(uuid, name)

Generate a deterministic UUID based on a UUID namespace and name as bytes (MD5).

uuid_v3_string(uuid, name)

Generate a deterministic UUID based on a UUID namespace and name as a string (MD5).

uuid_v5_bytes(uuid, name)

Generate a deterministic UUID based on a UUID namespace and name as bytes (SHA-1).

uuid_v5_string(uuid, name)

Generate a deterministic UUID based on a UUID namespace and name as a string (SHA-1).

For the latest details, consult the README in our public repository.arrow-up-right

Transform Studio

In order to facilitate testing out transformations, you can run dsync in Transform Studio mode.

Then open up your browser to the specified address. The interface will allow you to test out a transform config and JSON/BSON documents. For BSON documents, use extended JSON encoding. The update keys should be specified in extended JSON encoding as well.

Example extended json:

Last updated