DocumentDB
Amazon DocumentDB is a fully-managed NoSQL database service that is built for JSON data management and integrated with AWS.
Supported configurations
Fivetran supports the following DocumentDB configurations:
Supportability Category | Supported Values |
---|---|
Database versions | 4.0+ |
Connector limit per database | 3 |
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
Known limitations
- Fivetran can only connect to DocumentDB primary instances because DocumentDB does not support reading change streams from replica instances. We need change streams to perform incremental updates.
- Fivetran can only connect to DocumentDB using either SSH tunneling or AWS PrivateLink because of DocumentDB's security feature limitations.
- Fivetran only syncs DocumentDB documents that are smaller than 16 MB. If you try to sync a document that is 16 MB or larger, we skip syncing that document and notify you with a warning message in your Fivetran dashboard.
- Fivetran does not support syncing non-materialized views.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | |
Authorization via API | check |
Setup guide
In your primary database, you need to do the following:
- Allow access to your DocumentDB database using Fivetran's IP.
- Create a Fivetran-specific DocumentDB user with read-level permissions.
- Enable change streams on each collection that you want Fivetran to sync.
- Set the change stream log retention duration so that it can retain at least 48 hours' worth of changes. We recommend increasing the size to accommodate seven days' worth of data.
For specific instructions on how to set up your database, follow our step-by-step DocumentDB setup guide to connect DocumentDB with your destination using Fivetran connectors.
Sync overview
Once Fivetran is connected to your DocumentDB primary database, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your change streams to pull all your new and changed data at regular intervals.
Pack mode options
Pack modes determine the form in which Fivetran delivers your data. You must select a pack mode for each table you want Fivetran to sync. There are two pack modes, packed and standard.
NOTE: In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "
foo
(INTEGER)" means the column name isfoo
and it stores INTEGER data.
Packed mode (recommended)
We recommend using this mode For optimal performance with NoSQL databases such as Document DB.
In packed mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_fivetran_id (STRING) | data (JSON) |
---|---|
356a192b7913b04c54574d18c28d46e6395428ab | {"_id":1, "foo":2, nested":{"baz":3}} |
Unpacked mode
Fivetran unpacks one layer of nested fields and infer types. It is not recommended for NoSQL databases such as Document DB because their dynamic schemas may result in an excessive number of columns in the target destination, leading to performance issues during the loading process.
In unpacked mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_fivetran_id (STRING) | _id (INTEGER) | foo (INTEGER) | nested (JSON) |
---|---|---|---|
356a192b7913b04c54574d18c28d46e6395428ab | 1 | 2 | {"baz":3} |
Switching pack modes
You can switch pack modes for a table at any time in your Fivetran dashboard.
IMPORTANT: We automatically perform a full table re-sync when you change pack modes.
To change the pack mode for a connector, do the following:
- In the connector dashboard, go to the Setup tab.
- Click Edit connection details.
- In the connector setup form, change the Pack mode.
- Click Save & Test.
Schema information
Fivetran tries to replicate the exact schema and collections from your DocumentDB source database to your destination according to our standard database update strategies. For every schema in the DocumentDB database that you connect, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source collection._fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformations and mapping
As we extract your data, we match DocumentDB data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.
The following table illustrates how we transform your DocumentDB data types into Fivetran-supported types:
DocumentDB Type | Fivetran Type | Fivetran Supported |
---|---|---|
Double | DOUBLE | True |
String | STRING | True |
Object | False | |
Array | False | |
Binary Data | BINARY | True |
ObjectId | STRING | True |
Boolean | BOOLEAN | True |
Date | INSTANT | True |
Null | NULL | True |
32-bit Integer (int) | INT | True |
Timestamp | INSTANT | True |
64-bit Integer (long) | LONG | True |
MinKey | False | |
MaxKey | False | |
Decimal128 | BIGDECIMAL | True |
Regular Expression | STRING | True |
JavaScript | False | |
JavaScript (with scope) | False | |
Undefined | False | |
Symbol | False | |
DBPointer | False |
If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.
For example, the following JSON...
{"street" : "Main St."
"city" : "New York"
"country" : "US"
"phone" : "(555) 123-5555"
"zip code" : 12345
"people" : ["John", "Jane", "Adam"]
"car" : {"make" : "Honda",
"year" : 2014,
"type" : "AWD"}
}
...is converted to the following table when we load it into your destination:
_id | street | city | country | phone | zip code | people | car |
---|---|---|---|---|---|---|---|
1 | Main St. | New York | US | (555) 123-5555 | 12345 | ["John", "Jane", "Adam"] | {"make" : "Honda", "year" : 2014, "type" : "AWD"} |
Excluding source data
If you don't want to sync all your data, you can exclude databases and collections from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
You cannot exclude fields from your syncs.
Connectors created after July 17, 2023 support the Data Blocking and Column Hashing feature.
TIP: To enable support for this feature for connectors created before or on July 17, 2023, contact our support team.
Connectors created before or on July 17, 2023 have the following schema change handling options:
- Allow all: New schemas, tables, and new columns for existing tables are synced into the destination.
- Allow columns: New schemas and configurable tables are blocked from being synced to the destination. Data for existing schemas and tables are synced normally.
The Block all option is not supported.
Initial sync
When Fivetran connects to a new DocumentDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find()
operation on each collection. For large collections, we copy a limited amount of data at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.
Updating data
Fivetran performs incremental updates of any new or modified data from your source database. We use DocumentDB's change streams to detect changes to the selected collections.
Fivetran uses DocumentDB's built-in _id
field as the primary key in the source collections. Using the _id
field to identify rows, we merge changes to your documents into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Deleted data
We don't remove deleted rows from the destination. Instead, we mark rows as deleted by setting the value of their Fivetran-created system column _fivetran_deleted
to TRUE
.
Migrating service providers
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original DocumentDB database.