Data Transformations
Overview
Enterprise Dsync features a prebuilt YAML-based transformer that allows to add, remove and modify data elements using mappings and Common Expression Language (CEL).
Transformations are applied on-the-fly during both initial sync and CDC. The transformer itself can be run as a standalone process that Dsynct connects to over gRPC. The detailed instructions can be found in our public repository.
For convenience, Dsynct workers can run the transformer as an embedded process by providing the --transform option along with the path to the config file as the third argument:
dsynct worker <OTHER_OPTIONS> --transform $SOURCE $DESTINATION dsync-transform://transform.yamlWhen using Docker to run Dsynct, the transformer config file needs to be mounted to the container. For example:
docker run \
-v "./transform.yaml:/transform.yaml" \
markadiom/dsynct worker <OTHER_OPTIONS> \
--transform \
$SOURCE $DESTINATION dsync-transform://transform.yamlWriting Config Files
Dsync Transform runs off a YAML configuration file where the mappings are specified. Each source document is converted into an internal format, subjected to the mappings, and then converted back to the output document type.
Duplicate mappings are allowed and will fan out. Use the filter feature to avoid fanout if necessary. Note that mapping IDs should also require mapping the ID keys. ID mappings only have access to the original ID so that they can be applied to deletes which do not have access to the full data. If you need to convert ID types between systems (e.g., string to BSON ObjectID), see the Transform Data Types page for detailed guidance and examples.
Example Config File
mappings:
- namespace: srcnamespace
mapnamespace: dstnamespace
map:
should_be_int32: int32
cel:
name: self + "!"
newfield: '"abcd"'
should_be_int32: self + 5
add: ["newfield"]
delete: ["existingfield"]Each mapping must specify the source namespace. If the destination namespace is different, use mapnamespace. The key fields work as follows:
cel-- Specify a CEL expression for mapping each field. The variableselfrefers to the current value of that field. Note that CEL only supports a limited set of types (e.g., 64-bit integers but not 32-bit integers).map-- Apply a special type mapping after thecelexpression. Use this when you need a type that CEL cannot represent directly, such asint32for a 32-bit integer.add-- List fields that should be created in the output even if they were not present in the source document.delete-- List fields that should be removed from the output.
Configuration Reference
Top-Level Options
wild
string
*
When specifying a path, this matches anything.
delimiter
string
.
When specifying a path, this is the delimiter.
env
map[string, any]
Variables available under the env variable in CEL expression mappings.
unwrapbson
boolean
false
If true, will automatically convert various BSON types to a more native type (e.g., ObjectIDs to strings).
filtererrors
boolean
false
If true, will not fail on errors during conversion and instead skip and log a warning. Errors encountered when retrieving the original ID will still error.
defaultmapping
string
Name (namespace) of the mapping from mappings to use as a fallback.
namespacemapper
CEL string
Default expression to automatically map all namespaces. Has env and self (namespace) available.
idlist
boolean
false
If true, the id variable will always be a list. When false, the id variable contains the first id value if it is the only id value.
mappings
list[mapping]
List of mapping definitions (see below).
Mapping Options
Each entry in mappings supports the following fields:
namespace
string
Namespace this mapping applies to.
mapnamespace
string
New namespace name for the output.
mapid
CEL string
Expression to map the id for updates. Only has the original id field and env available.
filter
CEL string
Expression that returns a boolean; if true, the document will be retained. Only has the original id field and env available.
idkeys
list[string]
Describes the original names of each part of the id.
finalidkeys
list[string]
Describes the names of each part of the id after the mapping.
add
list[string]
Paths that will be added in the mapping if the parent exists.
delete
list[string]
Paths that will be deleted if they exist.
cel
map[string, CEL string]
For each defined path, specify a CEL expression to perform a mapping. Has env, id, doc, parent, and self available as variables.
map
map[string, string]
For each defined path, specify a special mapping function to apply. Applies after cel.
self
CEL string
An expression that serves as a mapping for the whole document.
Available Mappings
For use with the map configuration or inside a cel configuration expression. In certain cases, it is advisable to use map to force a type that CEL cannot represent directly.
Type Conversions
int32
Converts to an int32. Should use in map only.
float
Converts to a float32. Should use in map only.
json_number
Converts to a JSON Number.
json_decode
Decodes a JSON string or bytes into an object.
json_encode
Encodes an object as a JSON string.
BSON Conversions
bson_decimal128
Converts a string to a BSON Decimal128.
bson_decimal128_string
Converts a BSON Decimal128 to a string.
bson_object_id
Converts a string to a BSON ObjectID. Should use in map only.
bson_uuid
Converts a UUID string to a BSON UUID.
bson_object_id_string
Converts BSON ObjectID to a string.
bson_uuid_string
Converts BSON UUID to a string.
Hash Functions
md5
Applies the MD5 hash to a string or bytes, returning bytes.
sha1
Applies the SHA-1 hash to a string or bytes, returning bytes.
sha256
Applies the SHA-256 hash to a string or bytes, returning bytes.
Byte Mappings
be_to_int32
Converts bytes to an int assuming big endian format. Use in map to get int32.
be_to_int64
Converts bytes to an int64 assuming big endian format.
to_be_int32
Converts data into bytes representing an int32 in big endian format.
to_be_int64
Converts data into bytes representing an int64 in big endian format.
reverse_bytes
Reverses a byte array.
Available Functions
All the available mappings above are usable as unary functions in CEL expressions. The following additional functions are also available:
now_millis()
Current time in milliseconds.
now_nanos()
Current time in nanoseconds (resolution may be limited by your machine).
uuid_v4_bytes()
Generate a random UUID as bytes.
uuid_v4_string()
Generate a random UUID as a string.
uuid_v3_bytes(uuid, name)
Generate a deterministic UUID based on a UUID namespace and name as bytes (MD5).
uuid_v3_string(uuid, name)
Generate a deterministic UUID based on a UUID namespace and name as a string (MD5).
uuid_v5_bytes(uuid, name)
Generate a deterministic UUID based on a UUID namespace and name as bytes (SHA-1).
uuid_v5_string(uuid, name)
Generate a deterministic UUID based on a UUID namespace and name as a string (SHA-1).
For the latest details, consult the README in our public repository.
Transform Studio
In order to facilitate testing out transformations, you can run dsync in Transform Studio mode.
Then open up your browser to the specified address. The interface will allow you to test out a transform config and JSON/BSON documents. For BSON documents, use extended JSON encoding. The update keys should be specified in extended JSON encoding as well.
Example extended json:
Last updated