# Transform Data Types

## Data Format

During a sync, data can appear in three different formats depending on the operation:

1. **Initial Sync** -- A single binary blob representing the full document.
2. **Updates** -- A list representing the ID fields, plus a binary blob of the updated data.
3. **Deletes** -- A list representing only the ID fields.

A correct mapping must handle all three cases. This means you may need to provide a mapping for both the data (to cover case 1) and the ID (to cover cases 2 and 3).

## Data Types

Connectors may support different data types. Currently we support **JSON** and **BSON**. When transferring data between connectors that use different types, you need to provide configuration to map between them.

### ID Mapping

The ID is particularly important to transform because the data type and/or field names may differ between source and destination.

**Defaults:**

* **JSON**: single field called `id`
* **BSON**: single field called `_id`

IDs can also be composed of multiple fields. Use the `idkeys` (source) and `finalidkeys` (destination) properties in the config if the ID format does not match the defaults on either side.

**General approach:**

* Specify new fields under the `add` property.
* Specify old fields to remove under the `delete` property.
* Define a `mapid` expression so that update IDs can be set correctly.
* Add a `cel` expression for each new ID field showing how it is populated from the data.

#### The `id` Variable in CEL

In CEL expressions, `id` is a built-in variable representing the document's ID.

* If the ID has **one field**, `id` is the value of that field directly.
* If the ID has **multiple fields**, `id` is a list of values.

```cel
# Given id: {a: 1, b: 2}
id[1]   # returns 2

# Given id: {b: 2}
id      # returns 2
```

You can set `idlist: true` at the top level of the config to force `id` to always be a list, even when the ID contains only one field.

#### JSON to BSON Examples

Rename a string `id` (`"123"`) to a string `_id` (`"123"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
```

Map a string `id` (`"123"`) to an ObjectID `_id` (`{"$oid": "202cb962ac59075b964b0715"}`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["id"]
    add: ["_id"]
    mapid: id
    cel:
      _id: id
    map:
      _id: bson_object_id
```

#### BSON to JSON Examples

Rename a string `_id` (`"123"`) to a string `id` (`"123"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["_id"]
    add: ["id"]
    mapid: id
    cel:
      id: id
```

Convert an ObjectID `_id` (`{"$oid": "202cb962ac59075b964b0715"}`) to a string `id` (`"202cb962ac59075b964b0715"`):

```yaml
defaultmapping: default
mappings:
  - namespace: default
    delete: ["_id"]
    add: ["id"]
    mapid: string(id)
    cel:
      id: string(id)
```

#### Multi-Part ID Examples

When the source ID is composed of multiple fields, `id` becomes a list. Use `idkeys` to declare the source ID fields and `finalidkeys` for the destination. Individual parts are accessed with `id[0]`, `id[1]`, etc.

Map a two-part JSON ID (`region` and `user_id`) to BSON, renaming them to `_region` and `_user_id`:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    idkeys: ["region", "user_id"]
    finalidkeys: ["_region", "_user_id"]
    delete: ["region", "user_id"]
    add: ["_region", "_user_id"]
    mapid: id
    cel:
      _region: id[0]
      _user_id: id[1]
```

Collapse a two-part JSON ID (`tenant` and `record_id`) into a single BSON `_id` string by concatenating them:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    idkeys: ["tenant", "record_id"]
    delete: ["tenant", "record_id"]
    add: ["_id"]
    mapid: string(id[0]) + ":" + string(id[1])
    cel:
      _id: string(id[0]) + ":" + string(id[1])
```

Expand a single BSON `_id` back into a two-part JSON ID by splitting on a delimiter:

```yaml
defaultmapping: default
mappings:
  - namespace: default
    finalidkeys: ["tenant", "record_id"]
    delete: ["_id"]
    add: ["tenant", "record_id"]
    mapid: '[string(id).split(":")[0], string(id).split(":")[1]]'
    cel:
      tenant: string(id).split(":")[0]
      record_id: string(id).split(":")[1]
```
