Azure Cosmos DB

Azure Cosmos DB is Microsoft's fully-managed NoSQL database. It is serverless and designed for high-performance applications.

Supported services

Fivetran supports the following Azure Cosmos DB database services:

Supported configurations

Fivetran supports the following Azure Cosmos DB configurations:

Supportability Category	Supported Values
Connection limit per database	Depends on how many RUs you have provisioned. Each connection can consume up to 2,000 RUs.
Transport Layer Security (TLS)	TLS 1.1 - 1.3

Limitations

We support Azure Cosmos DB with the native MongoDB API and NoSQL API. We do not support the Cassandra API, Gremlin API, and Table API instances. Contact Fivetran Support if you would like us to integrate with these other API instances.
We utilize Azure Cosmos DB's change feed to sync your data. The change feed includes INSERT and UPDATE operations made to items within the container, but delete operations are not captured by the change feed. To capture deleted source data, we suggest that you add a soft-delete flag within your documents. Alternatively, if you are interested in tracking deleted data through Fivetran Teleport Sync, refer to our official documentation.

Features

Azure Cosmos DB for MongoDB

Feature Name	Supported	Notes
Capture deletes
History mode		Selectable for all tables
Custom data		All collections and fields
Data blocking		Data blocking for databases and containers is supported for all Cosmos DB connectors. Partial data blocking is supported for connectors created after July 17, 2023 only
Column hashing		Connectors created after July 17, 2023
Re-sync		Collection level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking		Azure Private Link
Authorization via API

Azure Cosmos DB for NoSQL

Feature Name	Supported	Notes
Capture deletes
History mode		Supported via all versions and deletes change feed mode
Custom data
Data blocking		Data blocking for databases and containers is supported for all Cosmos DB connectors. Partial data blocking is supported for connectors created after July 17, 2023 only
Column hashing		Connectors created after July 17, 2023
Re-sync		Connector and container-level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking		Azure Private Link
Authorization via API

We support the following deployment models for the Azure Cosmos DB connectors:

Azure Cosmos DB for MongoDB: SaaS
Azure Cosmos DB for NoSQL: SaaS and Hybrid

Setup guide

This overview will give you a general idea of the kind of work needed to set up a Azure Cosmos DB connection. For specific instructions on how to set up your database, see the guide for your Azure Cosmos DB database type:

Sync overview

Once Fivetran is connected to your Azure Cosmos DB resource, we pull a full dump of all selected data from your database. The initial sync finishes when all containers that existed when the sync started have finished importing. Once the initial sync is complete, we use each container's change feed to pull all your new and changed data at regular intervals.

Data access methods

Azure Cosmos DB for MongoDB

Fivetran uses a username and password to access your Azure Cosmos DB for MongoDB source database.

Azure Cosmos DB for NoSQL

We use one of the following methods to access your Azure Cosmos DB for NoSQL source data:

To learn more about data access control in Azure Cosmos DB, see Microsoft's Secure access to data in Azure Cosmos DB documentation.

Account key(recommended)

Fivetran uses an account key to authenticate to the source database. Primary/secondary keys provide access to all administrative resources for the database account.

Choose this method if you want Fivetran to automatically detect all readable databases and containers.

Resource token

Fivetran uses a resource token to access the source database. Resource tokens provide access to specific Azure Cosmos DB resources within a database. For this method, you must provide the source database name, the container name, and the matching resource token.

Choose this method if you want Fivetran to only access specific Azure Cosmos DB resources within your database.

Pack mode options

Pack modes determine the form in which Fivetran delivers your data. To sync your data in Fivetran, you must select a pack mode. There are two pack modes, packed and unpacked.

In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "foo (INTEGER)" means the column name is foo and it stores INTEGER data.

Unpacked mode

Fivetran unpacks one layer of nested fields and infers data types. For example, the following source data:

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

For Azure Cosmos DB for MongoDB, is delivered to your destination as follows:

_id (INTEGER)	foo (INTEGER)	nested (JSON)
1	2	`{"baz":3}`

For Azure Cosmos DB for NoSQL, is delivered to your destination as follows:

_fivetran_id (STRING)	_id (INTEGER)	foo (INTEGER)	nested (JSON)
`356a192b7913b04c54574d18c28d46e6395428ab`	1	2	`{"baz":3}`

Packed mode (default)

In packed mode, the following source data:

{
  "_id": 1, <== key
  "foo": 2,
  "nested": {
    "baz": 3
  }
}

For Azure Cosmos DB for MongoDB, is delivered to your destination as follows:

_id (INTEGER)	data (JSON)
1	`{"_id":1, "foo":2, nested":{"baz":3}}`

For Azure Cosmos DB for NoSQL, is delivered to your destination as follows:

_fivetran_id (STRING)	data (JSON)
`356a192b7913b04c54574d18c28d46e6395428ab`	`{"_id":1, "foo":2, nested":{"baz":3}}`

Switching pack modes

You can switch pack modes for a table at any time in your Fivetran dashboard.

We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.

To change the pack mode for a connection, do the following:

In the connection dashboard, go to the Setup tab.
Click Edit connection details.
In the connection setup form, change the Pack Mode.
Click Save & Test.

History mode Private Preview

History mode is a sync mode that tracks the history of the changes in your source data. We leverage Azure Cosmos DB's all versions and deletes change feed mode to capture all intermediate changes. This change feed mode must be enabled in order for your connection to use history mode. The all versions and deletes change feed mode is currently in the preview phase, and it is only compatible with Azure Cosmos DB for NoSQL accounts.

To sign up for the all versions and deletes change mode preview, follow Microsoft's Get Started with Change feed modes in Azure Cosmos DB instructions.

Once you've enabled this feature along with continuous backups on your Azure Cosmos DB account, reach out to our Support Team to enable history mode for your existing connectors.

Replication speeds

Two major factors can cause disparities between our estimated and the exact replication speed for your Fivetran-connected databases: network latency and the amount of request units (RUs) you have provisioned for your Azure Cosmos DB resource. Make sure your monitored container is not experiencing throttling; otherwise, you will experience delays when syncing the change feed.

Azure Cosmos DB for MongoDB

We extract data sequentially from Azure Cosmos DB for MongoDB.

Azure Cosmos DB for NoSQL

We recommend that you provision at least 10,000 RUs for each container, though the actual number may vary depending on your Cosmos usage. We scale up the number of parallel processing threads for data extraction proportionally to the number of RUs available. Each thread can achieve up to 2.5MB/s in data extraction speed, so more parallel threads allow for faster syncs.

Container Throughput (RU/s)	Parallel Threads	Max extraction rate (MB/s)
400 - 9,999	1	2.5
10,000 - 19,999	2	5.0
20,000 - 29,999	3	7.5
30,000 - 39,999	4	10
40,000 - 49,999	5	12.5
50,000 - 59,999	6	15
60,000 - 69,999	7	17.5
70,000 - 79,999	8	20
80,000 - 89,999	9	22.5
90,000 and higher	10	25.0

The ability to sync changes quickly also depends on the sync frequency you configure. The risk of the sync falling behind, or being unable to keep up with data changes, decreases as the sync frequency increases. We recommend a higher sync frequency for data sources with a high rate of data changes.

Schema information

Fivetran tries to replicate the exact databases and containers from your Azure Cosmos DB resource to your destination according to our standard database update strategies. For every schema in the Azure Cosmos DB container that you connect, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.

Fivetran-generated columns

Fivetran adds the following columns to every table in your destination:

_fivetran_id (STRING) a one-way hashed value that uniquely identifies each row. This is generated from the id and optional partition key value of each Azure Cosmos DB item.
_fivetran_deleted (BOOLEAN) marks rows that were deleted in the source collection.
_fivetran_synced (UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row.

We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.

_fivetran_id is not applicable to Azure Cosmos DB for MongoDB

Type transformations and mapping

As we extract your data, we match Azure Cosmos DB document-based data types to types that Fivetran supports. Fivetran supports all Azure Cosmos DB CORE data types.

The following table illustrates how we transform your Azure Cosmos DB data types into Fivetran-supported types:

Azure Cosmos DB Type	Fivetran Type	Fivetran Supported
BOOLEAN	BOOLEAN	True
TEXT	STRING	True
INTEGER	INT	True
LONG	LONG	True
SHORT	SHORT	True
DOUBLE	DOUBLE	True
FLOAT	FLOAT	True
OBJECT	JSON	True
ARRAY	JSON	True

We do not support OBJECT as a primary key(_id) type for Azure Cosmos DB for MongoDB API.

If we are missing an important data type that you need, reach out to support.

In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.

Nested data in unpacked mode

If the first-level field is a simple data type, we map it to its own type. If it's a complex data type such as an array or JSON data, we map it to a JSON type without unpacking. We do not automatically unpack nested JSON objects to separate tables in the destination. Any nested JSON objects are preserved as is in the destination so that you can use JSON processing functions.

For example, the following JSON...

{"street"  : "Main St."
"city"     : "New York"
"country"  : "US"
"phone"    : "(555) 123-5555"
"zip code" : 12345
"people"   : ["John", "Jane", "Adam"]
"car"      : {"make" : "Honda",
              "year" : 2014,
              "type" : "AWD"}
}

...is converted to the following table when we load it into your destination:

_id	street	city	country	phone	zip code	people	car
1	Main St.	New York	US	(555) 123-5555	12345	["John", "Jane", "Adam"]	{"make" : "Honda", "year" : 2014, "type" : "AWD"}

Excluding source data

If you don’t want to sync all the data from your source database, you can exclude databases, containers, or partial data from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit in subsequent syncs. For more information, see our Data Blocking and Column Hashing documentation.

Initial sync

When Fivetran connects to a new Azure Cosmos DB resource, we first copy all data from every container in every database (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy data by performing a read on the container change feed from its beginning.

We mark the progress at a regular interval throughout the initial sync. In case of sync stoppage or failure, we will pick up from the last successful point of data replication and continue importing in the next sync.

Updating data

Fivetran performs incremental updates of any new or modified data from your source database. We use Azure Cosmos DB's change feed to detect changes to the selected containers.

Fivetran uses Azure Cosmos DB's built-in id field, along with the partition key value that may be present in each container, to uniquely identify rows. This unique identifier is stored in the destination as a new column, _fivetran_id. Once we identify updated records, we merge the changes to your documents into the corresponding tables in your destination using the identifier:

Every inserted row in the source generates a new row in the destination with _fivetran_deleted = FALSE.
Every updated row in the source updates the data in the corresponding row in the destination, with _fivetran_deleted = FALSE.

_fivetran_id is not applicable to Azure Cosmos DB for MongoDB. The _id field uniquely identifies rows.

Deleted data

Azure Cosmos DB for MongoDB

We cannot track deleted data in Azure Cosmos DB for MongoDB.

Azure Cosmos DB for NoSQL

Azure Cosmos DB's change feed does not log deletes. To keep track of deleted data, we use Fivetran Teleport Sync to identify deleted records and apply the changes to the destination tables.

Fivetran Teleport Sync

Fivetran Teleport Sync is a proprietary incremental sync method that can incrementally replicate your database with no additional setup other than a read-only connection.

Fivetran Teleport Sync performs following operations:

Do a full table scan of each synced table for the id and partition key
Aggregate a compressed table (container) snapshot in the application memory
Compare the aggregated snapshot to the previous snapshot to deduce the differences
If there are differences in the snapshots, delete missing source items in the corresponding destination tables

We only perform snapshot comparisons during incremental syncs, as initial syncs cannot have deleted data.