Databaseslink
Updated November 17, 2023
Your production database contains your most important data, and Fivetran's database connectors are our most important and battle-tested connectors. Our database connectors are designed for correctness and safety. Correctness means that your destination should always be an accurate replica of your database source. Safety means that no matter what, Fivetran should never cause problems in your source database.
Supported databaseslink
Fivetran supports replicating data from the following databases:
- BigQuery Beta
- Convex Partner-Built Private Preview
- Cosmos DB
- Db2 for i Beta
- DocumentDB
- DynamoDB
- Elasticsearch Beta
- Firebase Beta
- MariaDB
- MongoDB
- MySQL
- OpenSearch Beta
- Oracle
- PlanetScale Partner-Built Private Preview
- PostgreSQL
- SAP ERP
- Snowflake Beta
- SQL Server
- Redshift Private Preview
If you'd like to integrate with a database that's not listed here, please reach out to Fivetran support.
Database Integration Overviewlink
- Connect: Connect us to your production database or read-replica via a specific Fivetran user that you create in your database. For instructions on creating a Fivetran user, see the setup guide for your database type.
- Initial sync: Fivetran discovers all selected tables and automatically pulls all historical data for all objects to which it has access. The duration of initial syncs varies by customer, but most syncs finish within a day.
- Transform and map schema: Fivetran parses through all your data, typecasting and mapping every column in the source object to a column in the corresponding table in the destination. We transform any data types that are not natively supported in the destination into data types that are accepted. Fivetran does not do any aggregations at this stage.
- Load: Fivetran automatically creates schemas and tables for each mapped source object in the destination. We populate these tables with the data from the initial sync.
- Incremental updates: Fivetran incrementally updates the data in your destination in batches, using a merge operation (upsert & insert) to only update new or changed data. You can choose how often these batches run, from every 5 minutes to every 24 hours. Fivetran's unique system automatically recognizes schema changes in your data source and persists these changes to the destination.
Connecting your databaselink
Connection Optionslink
Because our database connectors are pull connectors, Fivetran must have a way to connect to your source database. The three connection strategies that we support are:
- Connect directly to the database port on your database host
- Connect via an SSH tunnel
- Connect via a reverse SSH tunnel
See the Connection Options guide for more details.
Safetylink
Fivetran is a good citizen of your database. You can connect Fivetran to a read-replica of your database if you want to isolate us from your production workloads, but you can also connect us directly to your production instance. Our queries are designed to minimize contention with your production queries in two major ways:
- Our queries generate simple execution plans that can run concurrently with many other queries. They don't take higher priority over your other workloads.
- We use parallel processing to sync data from your database to your destination. Using multiple threads makes your syncs faster than using a single thread, since we can run multiple queries simultaneously. However, multiple threads also compete for resources slightly more than a single thread does. We limit the number of threads that we use to ensure that we don't compete too much with your concurrent queries.
High-Volume Agent connectorslink
The Fivetran High-Volume Agent (HVA) connectors facilitate high-volume data replication across multiple database systems.
Supported databases and serviceslink
We have HVA connectors for the following databases and services:
- Db2 for iBeta
- Oracle
- SAP ERP
- SQL ServerBeta
High-Volume Agent overviewlink
High-Volume Agent (HVA) connectors are designed to support high-volume data replication and use an on-premises software (agent) that typically resides on the same machine as a source database or file system to perform the task of capturing data. The on-premises agent sends the captured data back to Fivetran, so it can be processed and loaded to the destination.
We offer multiple network configuration options to connect your databases and services using HVA. For more information, see High-Volume Agent network configuration options.
On Unix and Linux, HVA runs as a system process (daemon). On Windows, HVA runs as a Windows service.
Fivetran connects to HVA using a specified TCP/IP port number (usually 4343
).
Benefitslink
HVA connectors are most beneficial for databases that generate large amounts of data or have an extensive historical data repository. Common use cases include software product databases that track every event, or databases used as ERP backends where there may be thousands of data units to track and monitor.
The distributed architecture with agents provides security benefits and multiple performance and scalability advantages.
- Speed: The agent on a source machine filters data before sending it down the data pipeline. Databases always write more information to the transaction log than should be replicated, and in many cases, only a subset of the application’s tables is replicated to another system. Filtering data close to the source improves efficiency.
- Bandwidth: The agent compresses data before sending it. Sending compressed data across the wire requires less bandwidth and/or fewer data packets. The agent reduces network cost, and allows capture of changes directly from a DBMS logging system.
- Security: Communication between Fivetran and High-Volume Agent SQL Server connector is encrypted during transit. Authentication is locked down with an SSL certificate.
Resource consumptionlink
CPU: HVA uses up to 1 CPU core in the system for batch runs. During the initial sync, up to four tables at a time are loading, each running its own process. Typically, the HVA process uses relatively few resources, but the associated database job uses a lot of resources to retrieve data (if parallel select operations run against a database, then the initial sync/load or full historical refresh may consume more CPU on the source database). Log parsing is generally the most resource intensive operation that can use up to 100% of a single CPU core when capture is running behind. HVA uses one log parser per database thread, and each database node in an SQL Server cluster constitutes one thread.
Memory: Memory consumption is up to 64 MB per transaction. Typically, memory usage does not reach 64 MB per transaction. However, this depends on the size of transactions and what portion of them are against tables that are part of the configuration. Note that you can adjust (upwards and downwards) the 64 MB threshold.
Storage space: An HVA installation is about 135 MB in size. While running, it does not use any additional disk space until it exceeds the 64 MB memory threshold and starts accumulating transaction files on disk. These files are compressed. If a database runs large batch jobs modifying multiple tables and only committing at the end, HVA may write more data to disk. We recommend starting with at least 5 GB for the configuration folder ($HVR_CONFIG).
Initial synclink
During the initial data sync, Fivetran connects to your source database and copies the entire contents of every table that you’ve selected to sync. The way we perform that initial sync depends on whether your source database has a B-tree (for example, MySQL or SQL Server) or a heap (for example, PostgreSQL or Oracle) data structure.
NOTE: The SAP HANA database connector has a different initial sync strategy from our other database connectors. For more information, see our SAP ERP on HANA initial sync documentation.
B-Tree databaseslink
For B-Tree tables, we perform the initial sync using a series of queries that look like this:
select *
from {table}
where {primary key} > ?
order by {primary key}
limit {page size}
content_copy
This is an efficient strategy because each table in a B-tree database is intrinsically ordered on the {primary key}
. There is no need for a sort by
clause in this query, because each table is already in {primary key}
order.
Heap databaseslink
For heap databases, the initial sync is more complicated. For small-to-medium tables, we sync the entire table with a single select *
query. For large tables, we break the contents into more manageable blocks based on the order in which the data was added to the database. We then sync the blocks in chronological order.
Excluding source datalink
If you don’t want to sync all the data from your source database into your destination, you can easily exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. Alternatively, you can change the permissions of the Fivetran user in your source database and restrict its access to certain tables or schemas.
Check initial sync progresslink
You can monitor the progress of your initial sync using the Logs tab on your Fivetran dashboard. In the Logs tab, you can:
Check the most recent log record to track the progress of your table imports.
- All tables marked NOT STARTED have not been imported yet.
- All tables marked COMPLETED have been imported.
Check older log records to track your sync progress.
To monitor your initial sync in the Logs tab, follow these steps:
Log in to your Fivetran Dashboard.
In the top menu, click Logs.
Type in "import_progress" and click Apply.
Click on the log record you'd like to view. In the example below, you can see that some tables have been synced and some tables have not yet been started.
Transformation and mapping overviewlink
Similar to Cloud API connections, Fivetran's database replication product pursues a "sync-all" strategy. This means we sync as many source schemas and tables as possible with as little setup and configuration as possible.
However, unlike Cloud API connections, our database connectors do one-to-one mapping of source schemas and tables to destination schemas and tables. Columns are also mapped one-for-one wherever possible, except when no compatible data type exists in the destination. In that case, we will transform data types that are not supported in the destination into data types that are supported.
Loading schemas and tableslink
Schemas and tableslink
A single database connector can replicate multiple schemas (represented below by the blue connector icon), as well as all of their underlying tables. These schemas are stored in the destination in the same logical schema/table hierarchy as exists in your source database. In the destination, each schema name is prepended with the prefix of your choice.
Mapping name from source to destinationlink
For database connectors, we do minimal name-changing between the source and destination. This allows SQL queries written for the source database to be easily ported to the destination, without having to change the field names in the queries. There are only a few exceptions to this rule:
Schema names that have the connector name prepended, and
Schema, table, or column names that violate Fivetran’s naming conventions. In these cases, we apply the following rules in the order in which they are listed:
- Non-ASCII characters (such as Chinese characters) are replaced with a transliteration.
- We only allow letters, numbers, and underscores. Any other character is replaced with an underscore.
- Upper-case letters are converted to lower-case.
- A name that starts with a number gets an underscore prepended to it.
- The character limit for names is 127 characters. Any name with more than 127 characters is truncated to the limit.
Changing the destination schema namelink
We do not support changing the connector name. It is therefore not possible to change the name of the destination schemas.
If you want to change the name of your destination schema, you can set up a new connector and remove your old connector once the new one has finished syncing. You can also create a SQL view on top of the existing schema with a new name that works better for your BI or SQL layer.
Incremental updateslink
Fivetran moves to performing incremental updates of any new or modified data from your source database. During incremental syncs, we use your database’s native change capture mechanism to request only the data that has changed since our last sync, including deletes. Each database uses a different change capture mechanism:
- Change Streams (DocumentDB)
- DynamoDB Streams (DynamoDB)
- Change Data Capture (MariaDB)
- Oplogs (MongoDB)
- Binary logs (MySQL)
- LogMiner (Oracle)
- Logical replication or XMIN system column (PostgreSQL)
- Change tracking or change data capture (SQL Server)
- Shadow tables and triggers (SAP ERP on HANA)
- Fivetran Teleport Sync
- Firebase Beta
- MySQL
- Oracle Private Preview
- PostgreSQL
- SQL Server
During incremental syncs, Fivetran maintains an internal set of progress cursors which allow us to track the exact point where our last successful sync left off. We record the last sync for each row in a column called _fivetran_synced
(UTC TIMESTAMP). This provides a seamless hand-off between syncs, ensuring that we do not miss any data.
Because of our progress cursors, Fivetran's system is extremely tolerant to service interruptions. If there is an interruption in your service (such as your destination going down), Fivetran will automatically resume syncing exactly where it left off once the issue is resolved, even days or weeks after, as long as log data is still present.
Deleted datalink
All database connectors support syncing deleted data. For tables with a primary key, we mark deleted rows as deleted in your destination, but do not actually remove them. We create an extra column in the destination to mark deleted rows, "_fivetran_deleted" (boolean). Deleted rows will have a true
value in this column.
The following databases support tables without a primary key:
The way we treat deleted rows in these tables varies by connector type. See the relevant database connector's documentation page for more information.
Frequency of updateslink
By default, all database connectors sync new and modified data every 15 minutes. Depending on the size of each update or the number of tables, syncs may take longer than that. In that case, the next data sync will start at the next 15-minute interval. For example:
Update X Start: 9:00am
Update X Finish: 9:18am
Update Y Start: 9:30am
You can change the update interval in your Fivetran dashboard. When database syncs encounter repeated errors, our system automatically retries the sync after the update interval or one hour, whichever is shorter.
Migrating service providerslink
If you migrate from one database service provider to another, we will need to do a full re-sync. That's because when you switch to another provider, Fivetran can't keep track of where our last sync left off, as the new service provider won't retain the same progress cursor and changelog information as the previous provider. Examples of switching service providers include:
- MySQL (Self-Hosted) <--> MySQL RDS
- PostgreSQL (Self-Hosted) <--> PostgreSQL RDS
- SQL Server (Self-Hosted) <--> SQL Server RDS
See the individual database guides for specific instructions on how to migrate.
Fivetran Teleport Synclink
Fivetran Teleport Sync is a proprietary database replication method that offers the completeness of snapshots while approaching the speed of log-based systems. With this sync mechanism, Fivetran can incrementally replicate your database with no additional setup other than a read-only SQL connection.
Fivetran Teleport Sync is an alternative to change log replication for incremental syncs. We recommend using it in the following cases:
- You cannot or do not want to provide Fivetran access to your database's change logs
- You need to limit the tables or columns that Fivetran can read from
- Change log syncing is slow or resource intensive due to the number of updates
- You load data into the source database using a TRUNCATE/LOAD or BACKUP/RESTORE process that invalidates change logs
- You currently use snapshot syncing and want a faster method
Fivetran Teleport Sync works by running SQL queries in your database that return a compressed version of your data in a form that can be used to find the changed data.
Teleport sync general limitationslink
- Review the limitations specific to each connector to verify whether tables without a primary key are supported.
- Tables without a primary key(if the database supports it) may experience slower performance compared to tables equipped with primary keys.
- The speed of data imports and incremental synchronizations can vary considerably across different connectors. Please reference the specific databases teleport sections for further suggestions.
- Performance can also be influenced by the type of hardware used and the specific deployment location of the database. Please look at specific databases teleport docs on limitations for further guidance.
- Certain Data Definition Language (DDL) changes may require a table re-sync are listed in Automatic table re-sync for Teleport sync method
Automatic re-synclink
Under certain circumstances, Fivetran automatically performs a full table re-sync or a full source re-sync of your source database. The specific circumstances vary by database type.
IMPORTANT: The full table or full source re-sync automatically occurs at your next scheduled sync time. For example, let's say that your connector syncs every day at 2pm and you make database updates that necessitate a re-sync at 4pm. We schedule your automatic re-sync for 2pm the next day.
Automatic full table re-synclink
Schema changes in the source database are the most common trigger for a full table re-sync. When the schema changes in the source, a full table re-sync lets us propagate these changes to your destination. You don’t have to take any action to initiate the re-sync, and the only difference you might notice is a longer than usual sync. See the chart below for specific causes for each database.
Note that we only perform automatic table re-syncs for SQL databases. Schema changes to NoSQL databases don’t require re-syncs, because non-relational databases do not have fixed schemas.
Automatic table re-sync for SQL databaseslink
We schedule automatic full table re-syncs for SQL databases in the following circumstances:
Database | What schedules a full table re-sync |
---|---|
BigQuery | If you enable syncs for a table that you had previously excluded from syncs |
MariaDB | A DDL (Data Definition Language) statement on a table without a primary key A DDL statement that re-orders columns A DDL statement that changes the column definition for SET or ENUM type columns A primary key change If you enable syncs for a table that you had previously excluded from syncs For Teleport syncs: if we encounter an error retrieving or processing a snapshot For Teleport syncs: if a DDL change contains a default value with a type other than Instant, LocalDate, certain LocalDateTime types, Boolean, String, Short, Int, Long, Float, Double, BigDecimal, or Json Any TRUNCATE DDL statements for tables that you have included in syncs |
MySQL | A DDL (Data Definition Language) statement on a table without a primary key A DDL statement that re-orders columns A DDL statement that changes the column definition for SET or ENUM type columns A primary key change If you enable syncs for a table that you had previously excluded from syncs For Teleport syncs: if we encounter an error retrieving or processing a snapshot For Teleport syncs: if a DDL change contains a default value with a type other than Instant, LocalDate, certain LocalDateTime types, Boolean, String, Short, Int, Long, Float, Double, BigDecimal, or Json Any TRUNCATE DDL statements for tables that you have included in syncs |
Oracle | A statement that changes a table’s schema If you enable syncs for a table that you had previously excluded from syncs A DDL statement that creates, deletes, or changes the position of a column If the table's SCN is no longer available to us Non-zero status for a LogMiner event If we've rescheduled a re-sync more than a few times, we stop the sync and generate a Task in your dashboard |
PostgreSQL | If you enable syncs for a table that you had previously excluded from syncs If XMIN values get frozen for a table Schema changes in PostgreSQL don’t trigger a re-sync |
SQL Server | If you enable syncs for a table that you had previously excluded from syncs If you change incremental update mechanisms If the table's LSN is no longer available to us Any TRUNCATE DDL statements for tables that you have included in syncs |
SAP HANA | If you enable syncs for a table that you had previously excluded from syncs |
Snowflake | If you enable syncs for a table that you had previously excluded from syncs Any DDL changes for tables that you have included in syncs, such as removing columns and adding columns with non-static default values |
Automatic table re-sync for NoSQL databases:link
We schedule automatic full table re-syncs for NoSQL databases in the following circumstances:
Database | What schedules a full table re-sync |
---|---|
DynamoDB | Change in pack mode Expired DynamoDB streams cursor |
MongoDB | Change in pack mode Expired change streams cursor |
DocumentDB | Expired change streams cursor |
Automatic full source re-synclink
Expired change logs require a full source re-sync, because we cannot perform our usual incremental syncs without them. Change logs can expire either in the sense of time or storage space. For example, logs might expire if you set your log retention period to 48 hours, and Fivetran isn’t able to sync for more than 48 hours. Logs can also expire if the storage space you allocated to them fills up. A full source re-sync ensures that the data in your destination matches your source data, and once it is complete, we resume our usual incremental sync schedule.
Automatic full source re-sync for SQL databaseslink
We schedule automatic full source re-syncs for SQL databases in the following circumstances:
Database | What schedules a full source re-sync |
---|---|
MariaDB | Expired logs (Fivetran generates a task in your dashboard when we detect expired logs and need to perform a full source re-sync) |
MySQL | Expired logs |
Oracle | Expired logs |
PostgreSQL | Expired logs |
SQL Server | Expired logs |
Automatic full source re-sync for NoSQL databaseslink
We schedule automatic full source re-syncs for NoSQL databases in the following circumstances:
Database | What schedules a full source re-sync |
---|---|
DynamoDB | Expired DynamoDB streams cursor |
MongoDB | Expired oplogs |
Automatic table re-sync for Teleport sync method:link
We schedule automatic full table re-syncs for Teleport sync method in the following circumstances:
- Renaming a table/schema
- Dropping columns
- Unsupported column type changes
- Adding a column with a non-static default value
- All DDL changes for pk-less tables will trigger a re-sync
Unsupported operationslink
Fivetran’s database connectors deliberately do not support some operations.
Non-primary replicas in SQL Serverlink
In SQL Server, you cannot connect Fivetran to a non-primary replica if you use Change Tracking (CT) as your incremental update mechanism. We can only sync with the primary replica in an availability group, because Microsoft does not allow CT to be enabled on any replica that is not designated as primary.
Repeated table truncations in SQL Serverlink
If you use CT as your incremental update mechanism, repeated table truncations prompt table re-syncs because they continuously invalidate the table's CT system version. We recommend you avoid truncating or use a different syncing mechanism like Fivetran Teleport Sync if you expect truncating in your current source workflow.
Retention periods under 24 hourslink
A log retention period must be at least as large as your sync period, or else the logs will expire in between Fivetran connector syncs. We recommend setting your retention period to at least 48 hours to ensure there are sufficient logs for proper replication.
Data warehouses as sourceslink
Fivetran supports the following data warehouses as sources:
You cannot connect Fivetran to other data warehouses like Teradata, Netezza, or Vertica as sources, because they have no change capture mechanism. Instead, connect us to whatever source databases are feeding these warehouses.
If you use a regular OLTP database like PostgreSQL or Oracle as a data warehouse, you can connect it to Fivetran as a source. However, your Fivetran syncs may be unacceptably slow if you are populating this data warehouse by periodically truncating and re-creating tables. Be sure to test this issue during your evaluation of Fivetran to make sure you get acceptable performance.
Must allow initial syncs to run without interruptionlink
You cannot run initial syncs within a limited time window (for example, only allow Fivetran to sync from midnight to 8 a.m.). Time limits increase the likelihood that our connection will be interrupted and the sync will need to restart. There is no need to restrict our syncs to off-times; our initial sync queries are designed to be easy for your query planner to schedule around, so the impact on your production workload should be minimal.
Guaranteed initial sync completion dateslink
Fivetran cannot guarantee a completion date for your initial sync. While Fivetran works hard to make your initial sync as fast as possible, the duration of this process is different for every customer.
Skipping historical syncslink
You cannot skip your historical sync.
Starting historical syncs from a specific datelink
You cannot start your historical sync from a specific date (for example, only sync data from August 2014 forward).
One-time production data migrationslink
Fivetran does not support one-time migrations of production data.