SQL Server
SQL Server is Microsoft's SQL database. Fivetran's integration service replicates data from your SQL Server source database and loads it into your destination at regular intervals.
NOTE: Fivetran supports SQL Server as both a database connector and a destination.
Row-based relational databases, like SQL Server, are optimized for high-volume, high-frequency transactional applications. While very performant as production databases, they are not optimized for analytical querying. Your analytical queries will be very slow if you build your BI stack directly on top of your transactional SQL Server database, and you run the risk of slowing down your application layer.
Column-based databases are optimized for performing analytical queries on large volumes of data at speeds far exceeding those of SQL Server. While these databases are not good for high-frequency transactional applications, they are highly efficient in data storage. They permit more data compression (10x-100x) than row-based databases, which makes them a cost-effective way to store and access data for analytical purposes.
Fivetran offers the following solutions for replicating data from your SQL Server source database:
- Fivetran native connector
- High-Volume Agent (HVA) connector
- HVR - SQL Server
Supported services
Fivetran supports the following SQL Server database services:
- Generic SQL Server
- Amazon SQL Server RDS
- Azure SQL Database
- Azure SQL Managed Instance
- Google Cloud SQL for SQL Server
Supported configurations
Fivetran supports the following SQL Server configurations:
Supportability Category | Supported Values |
---|---|
Database versions | SQL Server 2012 - 2022 |
Maximum table count per connector | 15,000 |
Maximum column count per table | 1,019 |
Connector limit per database | No limit |
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
IMPORTANT: We do not support single-user mode.
Which SQL Server instance types we can connect to depend on whether you use change tracking or change data capture as your incremental sync method. Read our Updating data documentation for more information.
Instance Types | Change Tracking | Change Data Capture |
---|---|---|
Generic SQL Server | ||
Primary instance | check | check |
Availability group replica | check | |
Amazon SQL Server RDS | ||
Primary instance | check | check |
Availability group replica | check | |
Azure SQL Database | ||
Primary instance | check | check |
Availability group replica | check | |
Azure SQL Managed Instance | ||
Primary instance | check | check |
Availability group replica | check | |
Google Cloud SQL for SQL Server | ||
Primary instance | check | check |
Availability group replica | check |
Known limitations
For connectors using change data capture (CDC) as the incremental sync method, we don't sync computed columns because CDC doesn't support these columns. Learn more in Microsoft's Change data capture - Known limitations and issues documentation.
SQL Server's default collation is case insensitive, so values like 'ABC' and 'abc' are treated identically by the server. If you want to differentiate between cases in your data, we recommend that you change the collation to a case-sensitive collation.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | All tables and fields |
History mode | check | Selectable for all tables |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | AWS PrivateLink: Generic SQL Server on EC2, SQL Server RDS Azure Private Link: Azure SQL Database, Azure SQL Managed Instance |
Setup guide
Follow our step-by-step setup guides for specific instructions on how to set up your SQL Server database type:
- Generic SQL Server
- Amazon SQL Server RDS
- Azure SQL Database
- Azure SQL Managed Instance
- Google Cloud SQL for SQL Server Private Preview
Sync overview
Fivetran connects to your database using SQL Server's default transaction isolation level, Read Committed. Once Fivetran is connected to your database, we pull a full dump of all selected data from your database. We then use one of SQL Server's two built-in tracking mechanisms, change tracking and change data capture, to pull all your new and changed data at regular intervals. If data in your database changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination.
Syncing empty tables and columns
Fivetran can sync empty tables for your SQL Server connector.
We can also sync empty columns in most cases. However, if you don't add rows after you create a new column, we cannot sync that new column. We need at least one row to see a new column because we learn of changes to a table's column cardinality when we see a row with a new or removed column during an update.
For more information, see our Features documentation.
Read Committed transaction isolation level data safety
The Read Committed transaction isolation level permits both non-repeatable reads and phantom reads. Both problems occur when a transaction issues two selects while a concurrent transaction modifies data. Fivetran never issues two selects in a single transaction during its syncs, so these read phenomena cannot occur in Fivetran's SQL Server connector and cause data integrity problems.
Connection modes
Fivetran first attempts to connect to your SQL Server database first using ApplicationIntent=ReadWrite
. If that fails, we try to connect once more using ApplicationIntent=ReadOnly
. We try to connect to your database with both modes because some database configurations are incompatible with one mode:
ApplicationIntent=ReadWrite
: Always On Availability Groups (AOAGs) using change data capture are incompatible because direct access to the primary database is forbidden and secondary databases only allow connections inReadOnly
mode.ApplicationIntent=ReadOnly
. Databases using change tracking (CT) in an AOAG are incompatible because the CT change tables can only be accessed on the primary node of the AOAG, which is only available inReadWrite
mode.
Schema information
Fivetran tries to replicate the exact schema and tables from your database to your destination.
We name the source schema with the database schema name. We replicate the source's tables and views to the destination schema. We name the destination schema with the connector name appended by the database schema name converted as per our schema naming rules. For example, if the connector name is sql_server_db
and the database schema name is fivetran_schema_#1
, the destination schema name is sql_server_db_fivetran_schema__1
. For more information, see our naming conventions documentation.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source table_fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row_fivetran_id
(STRING) is the hash of the non-Fivetran values of each row. It's a unique ID that Fivetran uses to identify rows in tables that do not have a primary key.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformation and mapping
As we extract your data, we match SQL Server data types to data types that Fivetran supports. If we don't support a certain data type, we automatically change that type to the closest supported type or, for some types, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.
The following table illustrates how we transform your SQL Server data types into Fivetran supported types:
SQL Server Type | Fivetran Type | Fivetran Supported | Notes |
---|---|---|---|
BIGINT | LONG | True | |
BINARY | BINARY | True | |
BIT | BOOLEAN | True | Not supported as primary keys for Teleport |
CHAR | STRING | True | |
DATE | LOCALDATE | True | |
DATETIME | LOCALDATETIME | True | |
DATETIME2 | LOCALDATETIME | True | |
DATETIMEOFFSET | TIMESTAMP | True | Not supported as primary keys for Teleport |
DECIMAL | BIGDECIMAL | True | |
FLOAT | DOUBLE | True | |
GEOMETRY | JSON | True | Not supported as primary keys for Teleport |
GEOGRAPHY | JSON | True | Not supported as primary keys for Teleport |
HIERARCHYID | STRING | True | |
IMAGE | BINARY | True | |
INTEGER | INTEGER | True | |
JSON | JSON | True | Not supported for CDC |
MONEY | BIGDECIMAL | True | |
NCHAR | STRING | True | |
NTEXT | STRING | True | |
NUMERIC | BIGDECIMAL | True | |
NVARCHAR | STRING | True | |
REAL | FLOAT | True | |
ROWVERSION | BINARY | True | |
SMALLDATETIME | LOCALDATETIME | True | |
SMALLMONEY | BIGDECIMAL | True | |
SMALLINT | SHORT | True | |
SQL_VARIANT | STRING | True | Not supported as primary keys for Teleport |
TEXT | STRING | True | |
TIME | STRING | True | |
TIMESTAMP | BINARY | True | |
TINYINT | SHORT | True | |
UNIQUEIDENTIFIER | STRING | True | |
VARCHAR | STRING | True | |
VARBINARY | BINARY | True | |
XML | STRING | True |
We also support syncing user-defined data types. When you create a user-defined type in SQL Server, you are required to choose a base type. If Fivetran supports that base type, we automatically transform your user-defined type to its corresponding Fivetran type. For more information, see Microsoft's user-defined types documentation.
If we are missing an important type that you need, reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. This can change either the data type itself or the parameterization on a data type. For more information, see the individual data destination pages under the Destinations documentation section.
Excluding source data
If you don’t want to sync all the data from your database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can change the permissions of the Fivetran user you created and restrict its access to certain tables or columns.
How to allow only a subset of tables
In your primary database, you can grant SELECT permissions to the Fivetran user on all tables in a given schema:
GRANT SELECT on SCHEMA::<schema> to fivetran;
or only grant SELECT permissions for a specific table:
GRANT SELECT ON [<schema>].[<table>] TO fivetran;
How to allow only a subset of columns
You can restrict the column access of your database's Fivetran user in two ways:
Grant SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] ([<column 1>], [<column 2>], ...) TO fivetran;
Deny SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] TO fivetran; DENY SELECT ON [<schema>].[<table>] ([<column X>], [<column Y>], ...) TO fivetran;
Initial sync
Once Fivetran is connected to your database or read replica, we first copy all rows from every table in every schema for which we have SELECT permission (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We copy rows by performing a SELECT statement on each table. For large tables, we copy a limited number of rows at a time so that we don't have to start the sync over from the beginning if our connection is lost midway. How many rows we copy at a time depends on whether your tables have clustered indices or not. For tables with clustered indices, we copy 500,000 rows at a time. For tables without clustered indices, we copy 5,000,000 rows at a time.
Connecting to a read replica
You can only connect Fivetran to a read replica if Change-Data Capture is enabled on the primary database. We need this in order to perform our incremental updates.
Updating data
Fivetran performs incremental updates of any new or modified data from your source database. During incremental updates, we request only the data that has changed since our last sync. We use one of the following incremental sync methods for incremental updates: change tracking (CT), change data capture (CDC), and Fivetran Teleport Sync.
NOTE: Change tracking and change data capture are built-in tracking mechanisms of SQL Server.
CT and CDC create change records that Fivetran accesses on a per-table basis during incremental updates. Tables that do not have CT or CDC enabled still appear on your Fivetran dashboard, but they are disabled. They also have a message indicating that you need to enable either CT or CDC.
Fivetran Teleport Sync works by directly querying the tables and does not require any specific settings enabled on the database.
To guarantee data integrity, we check for changes on every table with CT or CDC enabled during each update, which can add to the sync time. If you sync data using CT, CDC, or Fivetran Teleport Sync across many thousands of tables, you can expect longer sync times.
If both CT and CDC are enabled on a table, we use CDC as the incremental sync method. If enabling the tracking mechanisms (CT or CDC) is not feasible, then consider using Fivetran Teleport Sync as an alternative solution for capturing changes from SQL Server.
CT vs CDC vs Fivetran Teleport Sync
There are several key differences between change tracking (CT), change data capture (CDC), and Fivetran Teleport Sync:
Feature | Change tracking | Change data capture | Fivetran Teleport Sync |
---|---|---|---|
Syncs tables with primary keys | Yes | Yes | Yes* |
Syncs tables without primary keys | No | Yes | Yes** |
Can read from a replica | No | Yes | Yes |
Tracks DML changes | Yes | Yes | Yes |
Provides full record of changed data | No | Yes | Yes |
Tracks previous changes | No | Yes | No |
* IMPORTANT: Fivetran Teleport Sync supports a limited number of primary keys. See which primary keys are supported in our Teleport Sync limitations documentation.
** NOTE: Fivetran Teleport Sync for tables without primary keys is in Beta.
NOTE: CDC has heavier processing and storage overhead than CT.
To learn more about CDC and CT, read on below or see Microsoft's Track Data Changes documentation.
Change tracking
Change tracking (CT) records when a row in a table has changed, but does not capture the data that was changed. CT also does not capture how many times the row changed or record any previous changes. SQL Server records changes from all tables that have CT enabled in a single internal change table.
You cannot sync tables without a primary key if you choose CT as your incremental sync method. CT needs primary keys to identify rows that have changed.
You must enable CT on the primary database, as well as on each individual table that you want to sync. When you enable CT on your primary database, you can select a window size. Window size determines how long your change records are kept in the change table before they are deleted. If not explicitly specified, the default value is 3 days. We recommend changing the window size to 7 days. A longer window gives us more time to resolve any potential sync issues before change records are deleted. Our system detects when we were unable to process changes to a table before they were deleted from the change table. When we detect this situation, we trigger a re-sync for that table.
Change tracking is a lightweight background process that should not impact your production workload. CT takes up minimal storage space on your hard drive because its change table only records the primary keys of changed rows.
Change data capture
NOTE: Fivetran can only sync tables that have a single change data capture (CDC) instance. Our syncs only include tables and columns that are present in a CDC instance. If you add new tables or columns, you must create a new CDC instance that includes them and delete the old instance.
NOTE: To use a Clustered Columnstore Index (CCI) for CDC, you must enable CDC before adding the CCI.
Change data capture (CDC) tracks every change that is applied to a table and records those changes in a shadow history table. Unlike CT, CDC captures what data was changed and when, so you can see how many times a row has changed and view past changes.
CDC can track changes on any kind of table, with or without a primary key. Due to SQL Server limitations, a SQL Server connector with CDC enabled automatically propagates column type changes to your destination but does not propagate any other schema changes, such as renamed columns, new tables, and new columns.
You must enable CDC on the primary database, as well as on each individual table that you want to sync. When you enable CDC on your primary database, you can select a window size (also known as a retention period). Window size determines how long your change records are kept in the shadow history table before they are deleted. Unless specified, the default window size is 3 days. We recommend increasing the window size to 7 days. A longer window gives us more time to resolve any potential sync issues before change records are deleted.
CDC is a heavier process than CT. CDC takes up more storage space in your database because it captures entire changed records, not just the primary keys of changed rows. CDC also uses more compute resources than CT because it writes each table's changes to its own shadow history table. If you want to reduce some of the load on your production database, you can configure CDC to read from a replica, though you will need to enable it on your primary database first.
Fivetran Teleport Sync
Fivetran Teleport Sync is a proprietary incremental sync method that can incrementally replicate your database with no additional setup other than a read-only SQL connection.
Use Fivetran Teleport Sync if enabling the tracking mechanisms is not an option or if there are other reasons that would prohibit the Fivetran user from using CT or CDC.
NOTE: Fivetran Teleport Sync supports syncing tables with composite primary keys.
Fivetran Teleport Sync's queries perform the following operations on your database:
- Do a full table scan of each synced table
- Perform calculations on all values in each synced table's rows
- Aggregate a compressed table snapshot in the database's memory
For optimum Fivetran Teleport Sync performance, we recommend that you make the following resources available in your database:
- 1 GB Free RAM
- 1 Free CPU Core
- IOPS (Teleport Sync times decrease linearly with an increase of available IOPS).
Beta Fivetran Teleport Sync supports syncing tables without primary keys.
IMPORTANT: Tables without a primary key can have many exact duplicates of a row, but duplicate rows create data discrepancy issues in Fivetran Teleport Sync. Fivetran generates a
_fivetran_id
column for tables without a primary key, which means that many identical source rows could potentially share a_fivetran_id
. If only one of the identical set of rows is deleted, that operation may be applied to all of them, resulting in a data discrepancy between the source and destination.
Limitations
Fivetran Teleport Sync does not support all column types as primary keys. See which columns we support as primary keys in the Type transformation and mapping table.
DROP COLUMN
DDL operations prompt a table re-sync. ADD COLUMN
operations do not prompt a table re-sync because we are able to process the schema change without it.
Teleport supports syncing tables with primary keys up to 100GB in size. Tables exceeding this limit cannot be synced.
We do not support syncing more than 800 tables or tables with more than 75 million rows with Fivetran Teleport Sync for SQL Server.
For SQL Server 2014 (12.x) and earlier, Teleport can capture a maximum row size of 8000 bytes. Rows larger than 8000 bytes cause a connector failure. Later versions do not have this limit.
Tables with a primary key
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
- A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
- An UPDATE in the source table updates the corresponding row in the destination
Tables without a primary key
NOTE: Fivetran cannot sync tables without a primary key using CT. You must have CDC or Teleport (in Beta) enabled to sync tables without a primary key.
We handle changes to tables without a primary key differently:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
. - The
_fivetran_id
column helps us handle DELETE operations:- If there is a row in the destination that has a corresponding
_fivetran_id
value, that row will be updated with_fivetran_deleted = TRUE
. - If there is not a row in the destination that has a corresponding
_fivetran_id
value, a new row will be added with_fivetran_deleted = TRUE
.
- If there is a row in the destination that has a corresponding
- An UPDATE in the source table is treated as a DELETE followed by an INSERT, so it results in two rows in the destination:
- A row containing the old values with
_fivetran_deleted = TRUE
- A row containing the new values with
_fivetran_deleted = FALSE
- A row containing the old values with
As a result, one record in your source database may have several corresponding rows in your destination. For example, suppose you have a products
table in your source database with no primary key:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Boogie robot | 2 |
Cookie robot | 3 |
You load this table into your destination during your initial sync, creating this destination table:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 1 | FALSE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
You then update a row:
UPDATE products SET quantity = 4 WHERE description = 'Cookie robot';
After your UPDATE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 3 | TRUE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
Cookie robot | 4 | '2000-01-01 00:00:00' | 4 | FALSE | zxfd |
You then delete a row:
DELETE FROM products WHERE description = 'Boogie robot';
After your DELETE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:02' | 3 | TRUE | dfdf |
Cookie robot | 4 | '2000-01-01 00:00:02' | 4 | FALSE | zxfd |
Boogie robot | 3 | '2000-01-01 00:00:02' | 5 | TRUE | ewra |
So, while there may be just one record in your source database where description = Cookie robot
, there are two in your destination - an old version where _fivetran_deleted = TRUE
, and a new version where _fivetran_deleted = FALSE
.
We also de-duplicate* rows before we load them into your destination. We use the _fivetran_id
field, which is the hash of the non-Fivetran values in every row, to avoid creating multiple rows with identical contents. If, for example, you have the following table in your source:
*NOTE: Fivetran Teleport Sync does not support de-duplication for tables without primary keys. Learn more in our Fivetran Teleport Sync documentation.
description | quantity |
---|---|
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Then your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Deleted rows
We don't delete rows from the destination, though the way for how we process deletes differs for tables with primary keys and tables without a primary key.
Change tracking
You cannot sync tables without a primary key because CT requires primary keys to record changes. Tables without a primary key are excluded from your syncs. They are still listed in your Fivetran dashboard, but appear disabled.
If you want to add a primary key to a table, you can run the following query in your primary database:
ALTER TABLE <schema>.<pkless_table> ADD _id int NOT NULL IDENTITY (1,1) PRIMARY KEY
The new _id
primary key column...
- Auto-generates a sequential
_id
value when a new record is added - Cannot be changed or overwritten with new values
- Automatically populates on all records when added to an existing table
Change data capture
We merge changes to tables without a primary key into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
- A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
- An UPDATE in the source table soft deletes the existing row in the destination by setting
_fivetran_deleted = TRUE
, then inserts a new row with the updated values
NOTE: CDC's PRE_UPDATE operation (
__$operation = 3
) captures all the values in a row before an update occurs. However, for the TEXT, NTEXT, IMAGE, VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types, the row values are always recorded as null. Therefore, we treat updates to tables without primary keys as append-only, because we cannot recreate the hash of the pre-update row. For more information, see Microsoft's CDC documentation.
Migrating service providers
If you want to migrate service providers, we will need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original SQL Server database.