MongoDB

MongoDB is a NoSQL database characterized by a lack of fixed columns and fixed tables. Instead, it has collections (which are similar to tables) and dynamic schemas. MongoDB is a document-oriented database that uses JSON documents.

Supported services

Fivetran supports the following MongoDB configurations:

We do not support MongoDB Serverless Database.

Supported configurations

Fivetran supports the following MongoDB configurations:

Supportability Category	Supported Values
Supported Versions	4.0 and above
Connection limit per database	3
Transport Layer Security (TLS)	TLS 1.1 - 1.3

If your MongoDB server version is less than 4.0, you must upgrade to a supported version by March 1, 2025. Otherwise, the MongoDB drivers will no longer be compatible with your server version. Learn more in MongoDB's MongoDB's software lifecycle schedules documentation.

Features

Feature Name	Supported	Notes
Capture deletes
History mode		Selectable for all tables
Custom data		All tables and fields
Data blocking		Column level and table level
Column hashing
Re-sync		Table level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking		AWS PrivateLink: MongoDB Atlas and MongoDB on EC2 Azure Private Link: MongoDB Atlas and MongoDB on Azure VM Google Cloud Private Service Connect: MongoDB on GCP VM
Authorization via API

Setup guide

For specific instructions on how to set up your database, see the guide for your MongoDB configuration:

Sync overview

Once Fivetran is connected to your MongoDB primary database or read replica, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we use your oplogs or change streams to pull all your new and changed data at regular intervals.

Pack modes

Pack modes determine the form in which Fivetran delivers your data. You can choose between two pack modes for each connection - unpacked and packed.

In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "foo (INTEGER)" means the column name is foo and it stores INTEGER data.

Packed mode

Packed mode delivers data to your destination without unpacking it. We recommend using packed mode because it makes your syncs faster since we don't spend time unpacking data. You can still unpack your data in your destination using a transformation, which also keeps your original data intact.

In packed mode, the following source table

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

is delivered to your destination as:

_id (INTEGER)	data (JSON)
1	`{"_id":1, "foo":2, nested":{"baz":3}}`

Unpacked mode

Unpacked mode delivers data to your destination in unpacked format. We only unpack one layer of nested fields and infer types. If you need to unpack further layers, you must do so in your destination using a transformation. Syncs in unpacked mode tend to be slower than those in packed mode.

In unpacked mode, the following source table

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

is delivered to your destination as:

_id (INTEGER)	foo (INTEGER)	nested (JSON)
1	2	`{"baz":3}`

Switching pack modes

You can switch pack modes for your connection at any time in your Fivetran dashboard.

We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.

For connections created before October 18, 2023, pack modes are defined at the table level. You can switch pack modes for a table at any time in your Fivetran dashboard. When you change the pack mode for a table, we automatically perform a full table re-sync.

To change the pack mode for your connection, do the following:

In the connection dashboard, go to the Setup tab.
Click Edit connection details.
In the connection setup form, change the Pack Mode.
Click Save & Test.

Schema information

Fivetran tries to replicate the exact schema from your MongoDB source database to your destination.

When you connect to Fivetran and specify a source database, you also select a schema prefix. We map the schemas we discover in your source database to your destination and prepend the destination schema names with the prefix you selected.

Fivetran-generated columns

Fivetran adds the following columns to every table in your destination:

_fivetran_deleted (BOOLEAN) marks rows that were deleted in the source collection.
_fivetran_synced (UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.

We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.

Type transformations and mapping

As we extract your data, we match MongoDB data types to types that Fivetran supports. If we don't support a certain data type, we automatically match that data type to a regular Java type.

The following table illustrates how we transform your MongoDB data types into Fivetran-supported types:

MongoDB Data Type	Fivetran Data Type	Fivetran Supported	Notes
ARRAYLIST	JSON	True	Each element of array is recursively transformed based on type
BINARY	STRING	True	We support both Base64-encoded and UUID string representations. You can switch between them in the setup form.
BSON_ARRAY	JSON	True	Each element of array is recursively transformed based on type
BSON_BINARY	STRING	True	We support both Base64-encoded and UUID string representations. You can switch between them in the setup form.
BSON_BOOLEAN	BOOLEAN	True
BSON_DATETIME	INSTANT	True	INSTANT created using data-time milliseconds from EPOCH
BSON_DB_POINTER	STRING	True	BsonDBPointer is transformed to its ID(String)
BSON_DECIMAL_128	BIGDECIMAL	True
BSON_DOCUMENT	JSON	True
BSON_DOUBLE	DOUBLE	True
BSON_INT_32	INT	True
BSON_INT_64	LONG	True
BSON_NULL	NULL	True
BSON_OBJECT_ID	STRING	True	ObjectId value in String format
BSON_REGULAR_EXPRESSION	STRING	True	RegEx pattern(String) of regular expression is used for transformation
BSON_STRING		True	We infer the data type based on the value present in the field
BSON_SYMBOL	STRING	True
BSON_TIMESTAMP	INSTANT	True
BSON_UNDEFINED	NULL	True
CODE	STRING	True
DATE	INSTANT	True
DECIMAL128	BIGDECIMAL	True	If the decimal is NaN or Infinite, it is transformed to DECIMAL format
MAX_KEY		False
MIN_KEY		False
OBJECT_ID	STRING	True
SYMBOL	STRING	True	When performing a historical sync on a collection, we do not sync documents where the `_id` field is of type SYMBOL.
UUID	STRING	True	We support both Base64-encoded and UUID string representations. You can switch between them in the setup form.

If we are missing an important data type that you need, reach out to support.

In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.

Mapping

We map all first-level fields of your documents to columns in your destination. If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.

For example, the following JSON...

{"street"  : "Main St."
"city"     : "New York"
"country"  : "US"
"phone"    : "(555) 123-5555"
"zip code" : 12345
"people"   : ["John", "Jane", "Adam"]
"car"      : {"make" : "Honda",
              "year" : 2014,
              "type" : "AWD"}
}

...is converted to the following table when we load it into your destination:

_id	street	city	country	phone	zip code	people	car
1	Main St.	New York	US	(555) 123-5555	12345	["John", "Jane", "Adam"]	{"make" : "Honda", "year" : 2014, "type" : "AWD"}

Excluding source data

If you don’t want to sync all the data from your source database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.

Alternatively, you can restrict the Fivetran user's access to a subset of the databases, by providing the read@<database> role to each of the databases you want to sync, instead of the readAnyDatabase role. For more information, see our setup instructions.

You cannot exclude fields from your syncs.

Initial sync

When Fivetran connects to a new MongoDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find() operation on each collection. For large collections, we copy a limited amount of data at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.

Updating data

Fivetran performs incremental updates of any new or modified data from your source database.

We use one of the following incremental sync methods to perform incremental updates:

We use the different incremental sync methods depending on the read permissions of the Fivetran user:

readAnyDatabase@admin: Deployment-level change streams
read@<db>: Database-level change streams on a particular <db>
read@local and readAnyDatabase@admin: Oplogs (Only for MongoDB Replica Set version below 4.0)
clusterMonitor@admin and readAnyDatabase@admin: Oplogs (Only for MongoDB Sharded Cluster version below 4.0)

Fivetran uses MongoDB's built-in _id field as the primary key in the source tables. Using the _id field to identify rows, we merge changes to your documents into the corresponding tables in your destination:

Every inserted row in the source generates a new row in the destination with _fivetran_deleted = FALSE.
Every updated row in the source updates the data in the corresponding row in the destination, with _fivetran_deleted = FALSE.
For every deleted row, the _fivetran_deleted column value is set to TRUE for the corresponding row in the destination.

Change Streams

Change streams allow applications to access the real-time data changes without the complexity and risk of tailing the oplog. Change streams support syncing multi-document transactions. We open a change stream against each selected collection.

To use change streams, you must use the following on your replica sets or sharded clusters:

By default, we use change streams for incremental updates as it is faster compared to oplogs and works with granular access up to the database-level.

Oplogs

Oplog (operations log) is a special capped collection that keeps a rolling record of all the operations that modify the data stored in the MongoDB databases. We use oplogs to detect the changes to the selected collections.

Incremental sync cursor expiry

Oplog and change stream cursors expire when the time between two successive syncs leads to a loss of change data in the source database. A cursor may expire if the connection syncs too infrequently or if the size of the oplog doesn't accommodate the change data. When the cursors expire for the first time, we reschedule the connection's sync and trigger an automatic re-sync. If the cursors expire again, we create an error. For more information on resolving this error, see our Oplog Retention Period Error documentation.

Deleted data

We don't remove deleted rows from the destination. Instead, we mark the deleted rows by setting the value of the _fivetran_deleted system column to TRUE.

Excluded tables

Fivetran does not sync the following tables:

Views
System collections (<database>.system.* patterned collections)

Migrating service providers

If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original MongoDB database.