High-Volume Agent SQL Server
SQL Server is Microsoft's SQL database. Fivetran replicates data from your SQL Server source database and loads it into your destination using High-Volume Agent connector.
NOTE: You must have an Enterprise or Business Critical plan to use the High-Volume Agent SQL Server connector.
Row-based relational databases, like SQL Server, are optimized for high-volume, high-frequency transactional applications. While very performant as production databases, they are not optimized for analytical querying. Your analytical queries will be very slow if you build your BI stack directly on top of your transactional SQL Server database, and you run the risk of slowing down your application layer.
Column-based databases are optimized for performing analytical queries on large volumes of data at speeds far exceeding those of SQL Server. While these databases are not good for high-frequency transactional applications, they are highly efficient in data storage. They permit more data compression (10x-100x) than row-based databases, which makes them a cost-effective way to store and access data for analytical purposes.
Supported services
Fivetran supports the Generic SQL Server database service.
Supported configurations
Fivetran supports the following SQL Server configurations:
Operating Systems | Supported Versions |
---|---|
Windows | SQL Server 2012 - 2022 |
Linux | SQL Server 2017 - 2019 |
Recovery Models | Supported |
---|---|
Full | check |
Bulk-logged | |
Simple |
Supportability Category | Supported Values |
---|---|
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
Instance Types | Supported |
---|---|
Generic SQL Server | |
Primary instance | check |
Availability group replica | check |
Limitations
We do not support the following with the High-Volume Agent SQL Server connector:
- Single-user mode
- Amazon RDS for SQL Server
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | All tables and fields |
History mode | check | |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
API configurable | check | |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | AWS PrivateLink: Generic SQL Server on EC2 |
Setup guide
Follow our step-by-step High-Volume Agent SQL Server setup guide for specific instructions on how to set up your SQL Server database.
Sync overview
Once connected to your database, the Fivetran connector runs an initial sync, pulling a full dump of selected data from your database and sending it to your destination. After a successful initial sync, the connector runs in an incremental sync mode. In this mode, Fivetran automatically detects new or updated data, such as new tables or data type changes, and persists these changes into your destination. We use log-based capture to extract your database's change data, then process and load these changes at regular intervals, ensuring a consistently updated synchronization between your database and destination.
NOTE: Choosing a 1-minute sync frequency does not guarantee that your sync finishes within one minute.
Syncing empty tables and columns
Fivetran can sync empty tables for your SQL Server connector.
We can also sync empty columns in most cases. However, if you don't add rows after you create a new column, we cannot sync that new column. We need at least one row to see a new column because we learn of changes to a table's column cardinality when we see a row with a new or removed column during an update.
For more information, see our Features documentation.
Schema information
Fivetran tries to replicate the exact schema and tables from your database to your destination.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source table_fivetran_id
(STRING) is a unique ID that Fivetran uses to avoid duplicate rows in tables that do not have a primary key_fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformation and mapping
As we extract your data, we match SQL Server data types to data types that Fivetran supports. If we don't support a certain data type, we automatically change that type to the closest supported type or, for some types, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.
The following table illustrates how we transform your SQL Server data types into Fivetran supported types:
SQL Server Type | Fivetran Type | Fivetran Supported |
---|---|---|
BIGINT | LONG | True |
BINARY | BINARY | True |
BIT | BOOLEAN | True |
CHAR | STRING | True |
DATE | LOCALDATE | True |
DATETIME | LOCALDATETIME | True |
DATETIME2 | LOCALDATETIME | True |
DATETIMEOFFSET | TIMESTAMP | True |
DECIMAL | BIGDECIMAL | True |
FLOAT | DOUBLE | True |
GEOMETRY | JSON | True |
GEOGRAPHY | JSON | True |
IMAGE | BINARY | True |
INTEGER | INTEGER | True |
MONEY | BIGDECIMAL | True |
NCHAR | STRING | True |
NTEXT | STRING | True |
NUMERIC | BIGDECIMAL | True |
NVARCHAR | STRING | True |
REAL | FLOAT | True |
ROWVERSION | BINARY | True |
SMALLDATETIME | LOCALDATETIME | True |
SMALLMONEY | BIGDECIMAL | True |
SMALLINT | SHORT | True |
TEXT | STRING | True |
TIME | STRING | True |
TIMESTAMP | BINARY | True |
TINYINT | SHORT | True |
UNIQUEIDENTIFIER | STRING | True |
VARCHAR | STRING | True |
VARBINARY | BINARY | True |
XML | STRING | True |
HIERARCHYID | STRING | False |
If we are missing an important type that you need, reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.
Supported encodings
We support the following character encodings for HVA SQL Server.
Click to see the full list
- IBM437
- IBM850
- UTF-16LE
- WINDOWS-874
- WINDOWS-932
- WINDOWS-936
- WINDOWS-949
- WINDOWS-950
- WINDOWS-1250
- WINDOWS-1251
- WINDOWS-1252
- WINDOWS-1253
- WINDOWS-1254
- WINDOWS-1255
- WINDOWS-1256
- WINDOWS-1257
- WINDOWS-1258
Excluding source data
If you don’t want to sync all the data from your database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can change the permissions of the Fivetran user you created and restrict its access to certain tables or columns.
How to allow only a subset of tables
In your primary database, you can grant SELECT permissions to the Fivetran user on all tables in a given schema:
GRANT SELECT on SCHEMA::<schema> to fivetran;
or only grant SELECT permissions for a specific table:
GRANT SELECT ON [<schema>].[<table>] TO fivetran;
How to allow only a subset of columns
You can restrict the column access of your database's Fivetran user in two ways:
Grant SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] ([<column 1>], [<column 2>], ...) TO fivetran;
Deny SELECT permissions only on certain columns:
GRANT SELECT ON [<schema>].[<table>] TO fivetran; DENY SELECT ON [<schema>].[<table>] ([<column X>], [<column Y>], ...) TO fivetran;
Initial sync
Once connected to your database, the Fivetran connector copies all rows from every table in every schema for which a Fivetran user has SELECT
permissions (except for those you have excluded in your Fivetran dashboard) and sends them to your destination. Additionally, we add Fivetran-generated columns to every table in your destination offering visibility into the state of your data during the syncs.
Updating data
Fivetran performs incremental updates by extracting new or modified data from your source database's transaction log files using one of the following proprietary capture methods:
- Direct Capture: This method captures changes directly from SQL Server's online transaction logs.
- Archive Log Only: This method captures changes from SQL Server's transaction log backups. We do not read anything directly from the online transaction logs, therefore, High-Volume Agent can reside on a separate machine from the SQL Server DBMS.
NOTE:
- The Archive Log Only capture method generally exhibits higher latency than the Direct Capture method because changes can only be captured when the transaction log backup file is created. While this capture method enables high-performance log-based Change Data Capture (CDC) with minimal operating system and database privileges, it comes at the cost of higher capture latency.
- We automatically enable CDC tables for the SQL Server to log the primary key during updates. However, we disable the process that populates the CDC tables, so they will not contain any actual data. This approach does not add any additional load to the database server and differs from running SQL Server's native CDC replication.
Tables with a primary key
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
- A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
- An UPDATE in the source table updates the corresponding row in the destination
Tables without a primary key
If there are one or more non-nullable unique indexes, we use the first available index as the primary key. Otherwise, we use the _fivetran_id
as the primary key.
When _fivetran_id
is the primary key, the data is handled as follows:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
. - The
_fivetran_id
column helps us handle DELETE operations:- If there is a row in the destination that has a corresponding
_fivetran_id
value, that row is updated with_fivetran_deleted = TRUE
. - If there is not a row in the destination that has a corresponding
_fivetran_id
value, a new row is added with_fivetran_deleted = TRUE
.
- If there is a row in the destination that has a corresponding
- An UPDATE in the source table is treated as a DELETE followed by an INSERT, so it results in two rows in the destination:
- A row containing the old values with
_fivetran_deleted = TRUE
- A row containing the new values with
_fivetran_deleted = FALSE
- A row containing the old values with
As a result, one record in your source database may have several corresponding rows in your destination. For example, suppose you have a products
table in your source database with no primary key:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Boogie robot | 2 |
Cookie robot | 3 |
You load this table into your destination during your initial sync, creating this destination table:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 1 | FALSE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
You then update a row:
UPDATE products SET quantity = 4 WHERE description = 'Cookie robot';
After your UPDATE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 3 | TRUE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
Cookie robot | 4 | '2000-01-01 00:00:00' | 4 | FALSE | zxfd |
You then delete a row:
DELETE FROM products WHERE description = 'Boogie robot';
After your DELETE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:02' | 3 | TRUE | dfdf |
Cookie robot | 4 | '2000-01-01 00:00:02' | 4 | FALSE | zxfd |
Boogie robot | 3 | '2000-01-01 00:00:02' | 5 | TRUE | ewra |
So, while there may be just one record in your source database where description = Cookie robot
, there are two in your destination - an old version where _fivetran_deleted = TRUE
, and a new version where _fivetran_deleted = FALSE
.
We also de-duplicate rows before we load them into your destination. We use the _fivetran_id
field, which is the hash of the non-Fivetran values in every row, to avoid creating multiple rows with identical contents. If, for example, you have the following table in your source:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Then your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Deleted rows
We don't delete rows from the destination, though the way we process deletes differs for tables with primary keys and tables without a primary key.
Deleted columns
We do not delete columns from your destination. When a column is deleted from the source table, we replace the existing values in the corresponding destination column with NULL
values.
Table truncation
We don't support table truncation. The SQL Server source database forbids truncation on any table tracked by Change Data Capture (CDC).
To truncate a table, you must disable CDC, which also disables the logging that records the truncate event. As there is no history of the truncation in any logs that we can use, we can’t replicate the table truncation operation.
Log truncation Private Preview
In SQL Server, when using a transaction log for replication, the log's truncation point must be advanced to prevent excessive growth, which could impact performance or consume available disk space.
The transaction log operates as a single logical file implemented by one or more physical files. The transaction log is designed to be overwritten, provided nothing is blocking the overwrite from happening. Without the ability to be overwritten, the transaction log may expand, potentially exhausting all available disk space. The process of moving the transaction log "forward" to enable a portion to be overwritten is referred to as log truncation.
The HVA SQL Server connector automatically sets the log truncation method based on the database settings listed in the following table. For example, if your database operates in the Full Recovery mode, the log truncation is Controlled by Fivetran
, meaning HVA handles the log truncation process.
The basic rule is that if your database has any settings that require log truncation to be Controlled by Customer
, then you must handle the log truncation process. For example, in certain scenarios, you could have a database in Simple Recovery mode and running SQL Server's native replication. In this case, the log truncation is Controlled by Customer
, and therefore you must manage the log truncation.
DATABASE SETTINGS | LOG TRUNCATION | DROP CDC JOBS? | NOTES |
---|---|---|---|
Full Recovery mode | Controlled by Fivetran | Yes | If no sync is running with the CDC tables and/or Articles in place, the transaction log will grow because the truncation point for replication is not released. |
Simple Recovery mode | Controlled by Fivetran | Yes | If no sync is running with the CDC tables and/or Articles in place, the transaction log will grow because the truncation point for replication is not released. |
Read-only secondary replica in Always On Availability Group | Controlled by Customer | Yes | You need to have a separate job or task for managing the replication truncation point. For example, you might set up a separate SQL Server Agent job to unconditionally call sp_repldone at regular intervals. The agent will drop/disable SQL Server's Agent jobs. However, as long as the scheduled log release task runs, the truncation point for replication is managed properly, even if the sync is not running. This method works alongside another replication or CDC solution, provided the scheduled log release task meets the requirements of the other solution. |
SQL Server's native replication or non-HVA CDC usage (i.e., any CDC instances not created by the HVA connector). CDC instances created by HVA are named hvr_<object_id> . | Controlled by Customer | No | The agent does not drop or disables the native SQL Server Agent jobs created alongside CDC tables and/or Articles. Also, the agent does not interfere with the release of the truncation point for replication. However, using CDC tables to enable supplemental logging can lead to I/O overhead, as SQL Server jobs copy each change to a CDC table, which may not be used. |
NOTE: If log truncation is
Controlled by Customer
, HVA will not handle the log truncation for you. It is your responsibility to manage log growth, either manually or through SQL Server Agent.
One way to manually move the log is to create a job to run the sp_repldone
procedure.
EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1
NOTE: The
sp_repldone
procedure does not truncate anything; it just moves the truncation point for a backup to handle truncation.
The frequency of log truncation depends on how long you intend to retain transactions and your available disk space. Log truncation should not occur more frequently than your connector sync frequency to ensure all transactions are processed. It is recommended to set a log retention period of a few days.
Changing recovery model
When switching the database recovery model from full to either bulk-logged or simple, the format of the transaction logs changes. This can cause your connector to fail. To resolve this issue, you should revert the recovery model back to full and then initiate the connector resync. This is required to ensure full integrity of the delivered data in future.
Migrating service providers
If you want to migrate service providers, you need to do a full re-sync of your data because the new service provider won't retain the same change tracking data as your original SQL Server database.