# Transform Data Types

## Data Format

During a sync, data can appear in three different formats depending on the operation:

1. **Initial Sync** -- A single binary blob representing the full document.
2. **Updates** -- A list representing the ID fields, plus a binary blob of the updated data.
3. **Deletes** -- A list representing only the ID fields.

A correct mapping must handle all three cases. This means you may need to provide a mapping for both the data (to cover case 1) and the ID (to cover cases 2 and 3).

## Data Types

Connectors may support different data types. Currently we support **JSON** and **BSON**. When transferring data between connectors that use different types, you need to provide configuration to map between them.

### ID Mapping

The ID is particularly important to transform because the data type and/or field names may differ between source and destination.

**Defaults:**

* **JSON**: single field called `id`
* **BSON**: single field called `_id`

IDs can also be composed of multiple fields. Use the `idkeys` (source) and `finalidkeys` (destination) properties in the config if the ID format does not match the defaults on either side.

**General approach:**

* Specify new fields under the `add` property.
* Specify old fields to remove under the `delete` property.
* Define a `mapid` expression so that update IDs can be set correctly.
* Add a `cel` expression for each new ID field showing how it is populated from the data.

#### The `id` Variable in CEL

In CEL expressions, `id` is a built-in variable representing the document's ID.

* If the ID has **one field**, `id` is the value of that field directly.
* If the ID has **multiple fields**, `id` is a list of values.

```cel
# Given id: {a: 1, b: 2}
id[1]   # returns 2

# Given id: {b: 2}
id      # returns 2
```

You can set `idlist: true` at the top level of the config to force `id` to always be a list, even when the ID contains only one field.

#### JSON to BSON Examples

Rename a string `id` (`"123"`) to a string `_id` (`"123"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
```

Map a string `id` (`"123"`) to an ObjectID `_id` (`{"$oid": "202cb962ac59075b964b0715"}`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
    map:
      _id: bson_object_id
```

#### BSON to JSON Examples

Rename a string `_id` (`"123"`) to a string `id` (`"123"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["_id"]
    add: ["id"]
    mapid: id
    cel:
      id: id
```

Convert an ObjectID `_id` (`{"$oid": "202cb962ac59075b964b0715"}`) to a string `id` (`"202cb962ac59075b964b0715"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["_id"]
    add: ["id"]
    mapid: string(id)
    cel:
      id: string(id)
```

#### Multi-Part ID Examples

When the source ID is composed of multiple fields, `id` becomes a list. Use `idkeys` to declare the source ID fields and `finalidkeys` for the destination. Individual parts are accessed with `id[0]`, `id[1]`, etc.

Map a two-part JSON ID (`region` and `user_id`) to BSON, renaming them to `_region` and `_user_id`:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    idkeys: ["region", "user_id"]
    finalidkeys: ["_region", "_user_id"]
    delete: ["region", "user_id"]
    add: ["_region", "_user_id"]
    mapid: id
    cel:
      _region: id[0]
      _user_id: id[1]
```

Collapse a two-part JSON ID (`tenant` and `record_id`) into a single BSON `_id` string by concatenating them:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    idkeys: ["tenant", "record_id"]
    delete: ["tenant", "record_id"]
    add: ["_id"]
    mapid: string(id[0]) + ":" + string(id[1])
    cel:
      _id: string(id[0]) + ":" + string(id[1])
```

Expand a single BSON `_id` back into a two-part JSON ID by splitting on a delimiter:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    finalidkeys: ["tenant", "record_id"]
    delete: ["_id"]
    add: ["tenant", "record_id"]
    mapid: '[string(id).split(":")[0], string(id).split(":")[1]]'
    cel:
      tenant: string(id).split(":")[0]
      record_id: string(id).split(":")[1]
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.adiom.io/enterprise/running-dsynct/data-types.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
