DocumentDB

Amazon DocumentDB is a fully-managed NoSQL database service that is built for JSON data management and integrated with AWS.

Supported configurations

Fivetran supports the following DocumentDB configurations:

Supportability Category	Supported Values
Database versions	4.0+
Connection limit per database	3
Transport Layer Security (TLS)	TLS 1.1 - 1.3

Known limitations

Fivetran can only connect to DocumentDB primary instances because DocumentDB does not support reading change streams from replica instances. We need change streams to perform incremental updates.
Fivetran can only connect to DocumentDB using either SSH tunneling or AWS PrivateLink because of DocumentDB's security feature limitations.
Fivetran only syncs DocumentDB documents that are smaller than 16 MB. If you try to sync a document that is 16 MB or larger, we skip syncing that document and notify you with a warning message in your Fivetran dashboard.
Fivetran does not support syncing non-materialized views.

Features

Feature Name	Supported	Notes
Capture deletes		All collections
History mode
Custom data		All collections and fields.
Data blocking		Column level and collection level for connections created after July 17, 2023.
Column hashing		Connections created after July 17, 2023.
Re-sync		Collection level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking		AWS PrivateLink
Authorization via API

Setup guide

In your primary database, you need to do the following:

Allow access to your DocumentDB database using Fivetran's IP.
Create a Fivetran-specific DocumentDB user with read-level permissions.
Enable change streams on each collection that you want Fivetran to sync.
Set the change stream log retention duration so that it can retain at least 48 hours' worth of changes. We recommend increasing the size to accommodate seven days' worth of data.

For specific instructions on how to set up your database, follow our step-by-step DocumentDB setup guide to connect DocumentDB with your destination using Fivetran connectors.

Sync overview

Once Fivetran is connected to your DocumentDB primary database, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your change streams to pull all your new and changed data at regular intervals.

Pack mode options

Pack modes determine the form in which Fivetran delivers your data. You must select a pack mode for each table you want Fivetran to sync. There are two pack modes, packed and standard.

In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "foo (INTEGER)" means the column name is foo and it stores INTEGER data.

Packed mode (recommended)

We recommend using this mode for optimal performance with NoSQL databases like DocumentDB.

In packed mode, the following source table

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

is delivered to your destination as

_fivetran_id (STRING)	data (JSON)
`356a192b7913b04c54574d18c28d46e6395428ab`	`{"_id":1, "foo":2, nested":{"baz":3}}`

Unpacked mode

Fivetran unpacks one layer of nested fields and infer types. It is not recommended for NoSQL databases such as DocumentDB because their dynamic schemas may result in an excessive number of columns in the target destination, leading to performance issues during the loading process.

In unpacked mode, the following source table

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

is delivered to your destination as

_fivetran_id (STRING)	_id (INTEGER)	foo (INTEGER)	nested (JSON)
`356a192b7913b04c54574d18c28d46e6395428ab`	1	2	`{"baz":3}`

Switching pack modes

You can switch pack modes for a table at any time in your Fivetran dashboard.

We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.

To change the pack mode for a connection, do the following:

In the connection dashboard, go to the Setup tab.
Click Edit connection details.
In the connection setup form, change the Pack mode.
Click Save & Test.

Schema information

Fivetran tries to replicate the exact schema and collections from your DocumentDB source database to your destination according to our standard database update strategies. For every schema in the DocumentDB database that you connect, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.

Fivetran-generated columns

Fivetran adds the following columns to every table in your destination:

_fivetran_deleted (BOOLEAN) marks rows that were deleted in the source collection.
_fivetran_synced (UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.

We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.

Type transformations and mapping

As we extract your data, we match DocumentDB data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.

The following table illustrates how we transform your DocumentDB data types into Fivetran-supported types:

DocumentDB Type	Fivetran Type	Fivetran Supported
Double	DOUBLE	True
String	STRING	True
Object		False
Array		False
Binary Data	BINARY	True
ObjectId	STRING	True
Boolean	BOOLEAN	True
Date	INSTANT	True
Null	NULL	True
32-bit Integer (int)	INT	True
Timestamp	INSTANT	True
64-bit Integer (long)	LONG	True
MinKey		False
MaxKey		False
Decimal128	BIGDECIMAL	True
Regular Expression	STRING	True
JavaScript		False
JavaScript (with scope)		False
Undefined		False
Symbol		False
DBPointer		False

If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.

For example, the following JSON...

{"street"  : "Main St."
"city"     : "New York"
"country"  : "US"
"phone"    : "(555) 123-5555"
"zip code" : 12345
"people"   : ["John", "Jane", "Adam"]
"car"      : {"make" : "Honda",
                "year" : 2014,
                "type" : "AWD"}
}

...is converted to the following table when we load it into your destination:

_id	street	city	country	phone	zip code	people	car
1	Main St.	New York	US	(555) 123-5555	12345	["John", "Jane", "Adam"]	{"make" : "Honda", "year" : 2014, "type" : "AWD"}

Excluding source data

If you don't want to sync all your data, you can exclude databases and collections from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.

You cannot exclude fields from your syncs.

Connections created after July 17, 2023 support the Data Blocking and Column Hashing feature.

To enable support for this feature for connections created before or on July 17, 2023, contact our support team.

Connections created before or on July 17, 2023 have the following schema change handling options:

Allow all: New schemas, tables, and new columns for existing tables are synced into the destination.
Allow columns: New schemas and configurable tables are blocked from being synced to the destination. Data for existing schemas and tables are synced normally.

The Block all option is not supported.

Initial sync

When Fivetran connects to a new DocumentDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find() operation on each collection. For large collections, we copy a limited amount of data at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.

Updating data

Fivetran performs incremental updates of any new or modified data from your source database. We use DocumentDB's change streams to detect changes to the selected collections.

Fivetran uses DocumentDB's built-in _id field as the primary key in the source collections. Using the _id field to identify rows, we merge changes to your documents into the corresponding tables in your destination:

Every inserted row in the source generates a new row in the destination with _fivetran_deleted = FALSE.
Every updated row in the source updates the data in the corresponding row in the destination, with _fivetran_deleted = FALSE.
For every deleted row, the _fivetran_deleted column value is set to TRUE for the corresponding row in the destination.

Deleted data

We don't remove deleted rows from the destination. Instead, we mark rows as deleted by setting the value of their Fivetran-created system column _fivetran_deleted to TRUE.

Migrating service providers

If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original DocumentDB database.