MongoDB
MongoDB is a NoSQL database characterized by a lack of fixed columns and fixed tables. Instead, it has collections (which are similar to tables) and dynamic schemas. MongoDB is a document-oriented database that uses JSON documents.
Supported services
Fivetran supports the following MongoDB configurations:
IMPORTANT: We do not support MongoDB Serverless Database.
Supported configurations
Fivetran supports the following MongoDB configurations:
Supportability Category | Supported Values |
---|---|
Connector limit per database | 3 |
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | check | |
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | |
Authorization via API | check |
Setup guide
For specific instructions on how to set up your database, see the guide for your MongoDB configuration:
Sync overview
Once Fivetran is connected to your MongoDB primary database or read replica, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your oplogs or change streams to pull all your new and changed data at regular intervals.
Pack modes
Pack modes determine the form in which Fivetran delivers your data. You can choose between two pack modes for each connection - unpacked and packed.
NOTE: In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "
foo
(INTEGER)" means the column name isfoo
and it stores INTEGER data.
Packed mode
Packed mode delivers data to your destination without unpacking it. We recommend using packed mode because it makes your syncs faster since we don't spend time unpacking data. You can still unpack your data in your destination using a transformation, which also keeps your original data intact.
In packed mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as:
_id (INTEGER) | data (JSON) |
---|---|
1 | {"_id":1, "foo":2, nested":{"baz":3}} |
Unpacked mode
Unpacked mode delivers data to your destination in unpacked format. We only unpack one layer of nested fields and infer types. If you need to unpack further layers, you must do so in your destination using a transformation. Syncs in unpacked mode tend to be slower than those in packed mode.
In unpacked mode, the following source table
{
"_id": 1, <== key
"foo": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as:
_id (INTEGER) | foo (INTEGER) | nested (JSON) |
---|---|---|
1 | 2 | {"baz":3} |
Switching pack modes
You can switch pack modes for your connector at any time in your Fivetran dashboard. When you change the pack mode for your connector, we automatically perform a full connector re-sync.
NOTE: For connectors created before October 18, 2023, pack modes are defined at the table level. You can switch pack modes for a table at any time in your Fivetran dashboard. When you change the pack mode for a table, we automatically perform a full table re-sync.
To change the pack mode for your connector, do the following:
- In the connector dashboard, go to the Setup tab.
- Click Edit connection details.
- In the connector setup form, change the Pack Mode.
- Click Save & Test.
Schema information
Fivetran tries to replicate the exact schema from your MongoDB source database to your destination.
When you connect to Fivetran and specify a source database, you also select a schema prefix. We map the schemas we discover in your source database to your destination and prepend the destination schema names with the prefix you selected.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source collection._fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformations and mapping
As we extract your data, we match MongoDB data types to types that Fivetran supports. If we don't support a certain data type, we automatically match that data type to a regular Java type.
The following table illustrates how we transform your MongoDB data types into Fivetran-supported types:
MongoDB Data Type | Fivetran Data Type | Fivetran Supported | Notes |
---|---|---|---|
ARRAYLIST | JSON | True | Each element of array is recursively transformed based on type |
BINARY | STRING | True | We support both Base64-encoded and UUID string representations. You can switch between them in the setup form. |
BSON_ARRAY | JSON | True | Each element of array is recursively transformed based on type |
BSON_BINARY | STRING | True | We support both Base64-encoded and UUID string representations. You can switch between them in the setup form. |
BSON_BOOLEAN | BOOLEAN | True | |
BSON_DATETIME | INSTANT | True | INSTANT created using data-time milliseconds from EPOCH |
BSON_DB_POINTER | STRING | True | BsonDBPointer is transformed to its ID(String) |
BSON_DECIMAL_128 | BIGDECIMAL | True | |
BSON_DOCUMENT | JSON | True | |
BSON_DOUBLE | DOUBLE | True | |
BSON_INT_32 | INT | True | |
BSON_INT_64 | LONG | True | |
BSON_NULL | NULL | True | |
BSON_OBJECT_ID | STRING | True | ObjectId value in String format |
BSON_REGULAR_EXPRESSION | STRING | True | RegEx pattern(String) of regular expression is used for transformation |
BSON_STRING | True | We infer the data type based on the value present in the field | |
BSON_SYMBOL | STRING | True | |
BSON_TIMESTAMP | INSTANT | True | |
BSON_UNDEFINED | NULL | True | |
CODE | STRING | True | |
DATE | INSTANT | True | |
DECIMAL128 | BIGDECIMAL | True | If the decimal is NaN or Infinite, it is transformed to DECIMAL format |
MAX_KEY | False | ||
MIN_KEY | False | ||
OBJECT_ID | STRING | True | |
SYMBOL | STRING | True | |
UUID | STRING | True | We support both Base64-encoded and UUID string representations. You can switch between them in the setup form. |
If we are missing an important data type that you need, reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.
Mapping
We map all first-level fields of your documents to columns in your destination. If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.
For example, the following JSON...
{"street" : "Main St."
"city" : "New York"
"country" : "US"
"phone" : "(555) 123-5555"
"zip code" : 12345
"people" : ["John", "Jane", "Adam"]
"car" : {"make" : "Honda",
"year" : 2014,
"type" : "AWD"}
}
...is converted to the following table when we load it into your destination:
_id | street | city | country | phone | zip code | people | car |
---|---|---|---|---|---|---|---|
1 | Main St. | New York | US | (555) 123-5555 | 12345 | ["John", "Jane", "Adam"] | {"make" : "Honda", "year" : 2014, "type" : "AWD"} |
Excluding source data
If you don’t want to sync all the data from your source database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can restrict the Fivetran user's access to a subset of the databases, by providing the read@<database>
role to each of the databases you want to sync, instead of the readAnyDatabase
role. For more information, see our setup instructions.
You cannot exclude fields from your syncs.
Initial sync
When Fivetran connects to a new MongoDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find()
operation on each collection. For large collections, we copy a limited amount of data at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.
Updating data
Fivetran performs incremental updates of any new or modified data from your source database.
We use one of the following incremental sync methods to perform incremental updates:
We use the different incremental sync methods depending on the read permissions of the Fivetran user:
readAnyDatabase@admin
: Deployment-level change streamsread@<db>
: Database-level change streams on a particular<db>
read@local
andreadAnyDatabase@admin
: Oplogs (Only for MongoDB Replica Set version below 4.0)clusterMonitor@admin
andreadAnyDatabase@admin
: Oplogs (Only for MongoDB Sharded Cluster version below 4.0)
Fivetran uses MongoDB's built-in _id
field as the primary key in the source tables. Using the _id
field to identify rows, we merge changes to your documents into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Change Streams
Change streams allow applications to access the real-time data changes without the complexity and risk of tailing the oplog. Change streams support syncing multi-document transactions. We open a change stream against each selected collection.
To use change streams, you must use the following on your replica sets or sharded clusters:
- MongoDB version 4.0+
- WiredTiger storage engine
- Replica set protocol version 1 (pv1)
By default, we use change streams for incremental updates as it is faster compared to oplogs and works with granular access up to the database-level.
Oplogs
Oplog (operations log) is a special capped collection that keeps a rolling record of all the operations that modify the data stored in the MongoDB databases. We use oplogs to detect the changes to the selected collections.
Incremental sync cursor expiry
Oplog and change stream cursors expire when the time between two successive syncs leads to a loss of change data in the source database. Cursors may expire because of the connector’s sync frequency, or the size of the oplog doesn't accommodate the change data. When cursors expire, we reschedule the connector's sync and trigger automatic re-syncs:
- A full source re-sync when oplog cursors expire.
- A full table re-sync when change streams cursors for the table expire.
Deleted data
We don't remove deleted rows from the destination. Instead, we mark the deleted rows by setting the value of the _fivetran_deleted
system column to TRUE
.
Excluded tables
Fivetran does not sync the following tables:
- Views
- System collections (<database>.system.* patterned collections)
Migrating service providers
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original MongoDB database.