MongoDB
MongoDB is a NoSQL database characterized by a lack of fixed columns and fixed tables. Instead, it has collections (which are similar to tables) and dynamic schemas. MongoDB is a document-oriented database that uses JSON documents.
Supported services
Fivetran supports two different MongoDB configurations:
Features
Feature Name | Supported | Notes |
---|---|---|
Capture Deletes | check | |
Custom Data | check | All tables and fields |
Data Blocking | check | Table level |
Column Hashing | ||
Re-sync | check | Table level |
History | ||
API Configurable | check | |
Priority-first sync | ||
dbt Package |
Setup guide
In your master database, you need to do the following:
- Allow access to your MongoDB database via Fivetran's IP
- Create a Fivetran-specific MongoDB user with read-level permissions
- Set the oplog size so that it can retain at least 24 hours' worth of changes. We recommend increasing the size to accommodate seven days' worth of data.
- Depending on your MongoDB configuration, either
- Allow access to a primary or secondary node in your MongoDB replica set, or
- Allow access to every primary shard node and the
mongos
query router in your MongoDB sharded cluster
- If you are using MongoDB version 4.0 or later, enable change streams for all collections in all members of the replica set and of the sharded cluster.
For specific instructions on how to set up your database, see the guide for your MongoDB configuration:
Sync overview
Once Fivetran is connected to your MongoDB master database or read replica, we pull a full dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In the meantime, we sync incoming changes to those collections as well. Once the initial sync is complete, we either use your change streams (versions 4.0 or later) or your oplogs (versions 3.6 or earlier) to pull all your new and changed data at regular intervals.
Note: Fivetran only syncs MongoDB documents that are smaller than 16 MB. If you try to sync a document that is 16 MB or larger, we skip syncing that document and notify you with a warning message in your Fivetran dashboard.
Schema information
Fivetran tries to replicate the exact schema from your MongoDB source database to your destination.
When you connect to Fivetran and specify a source database, you also select a schema prefix. We map the schemas we discover in your source database to your destination and prepend the destination schema names with the prefix you selected.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(boolean) marks rows that were deleted in the source collection._fivetran_synced
(UTC timestamp) keeps track of when each row was last successfully synced.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformations and mapping
Fivetran supports all MongoDB data types. We map all first-level fields of your documents to columns in your destination.
If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.
For example, the following JSON...
{"street" : "Main St."
"city" : "New York"
"country" : "US"
"phone" : "(555) 123-5555"
"zip code" : 12345
"people" : ["John", "Jane", "Adam"]
"car" : {"make" : "Honda",
"year" : 2014,
"type" : "AWD"}
}
...is converted to the following table when we load it into your destination:
_id | street | city | country | phone | zip code | people | car |
---|---|---|---|---|---|---|---|
1 | Main St. | New York | US | (555) 123-5555 | 12345 | ["John", "Jane", "Adam"] | {"make" : "Honda", "year" : 2014, "type" : "AWD"} |
Excluding source data
If you don't want to sync all your data, you can exclude databases and collections from your syncs on your Fivetran dashboard. To do so, go to your connector details page and un-check the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
You cannot exclude fields from your syncs.
Initial Sync
When Fivetran connects to a new MongoDB database, we first copy all data from every collection in every schema (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a db.collection.find()
operation on each collection. For large collections, we copy a limited amount of data at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.
Updating data
Once the initial sync is complete, Fivetran performs incremental updates of any new or modified data from your source database. The logic controlling this mechanism differs by MongoDB version.
- For versions 3.6 or earlier, we use MongoDB's oplogs to detect changes to the selected collections.
- For versions 4.0 or later, we use MongoDB's Change Streams to detect changes to the selected collections.
Fivetran uses MongoDB's built-in _id
field as the primary key in the source tables. Using the _id
field to identify rows, we merge changes to your documents into the corresponding tables in your destination:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Deleted data
We don't remove deleted rows from the destination. Instead, we mark rows as deleted by setting the value of their Fivetran-created system column _fivetran_deleted
to TRUE
.
Migrating service providers
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original MongoDB database.