Snowflake
Snowflake is a cloud-based data warehousing platform that allows organizations to store and analyze large volumes of data in a scalable and cost-effective manner. The Fivetran Snowflake connector allows you to sync tables and views from one Snowflake database to your destination.
NOTE: Fivetran supports Snowflake as both a database connector and a destination.
We can sync any number of schemas, tables, and views from within your Snowflake database. Fivetran supports regular and materialized views.
TIP: If you want to sync multiple Snowflake databases, we recommend that you create a connector for each database.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | check | |
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | |
Authorization via API |
Setup guide
Follow our step-by-step Snowflake setup guide to connect your Snowflake database with Fivetran.
Sync overview
Once Fivetran is connected to your database, we pull a full dump of all selected data from your database. We then use either Fivetran Teleport or Time Travel-based sync to pull all your new and changed data at regular intervals. If data in your source database changes (for example, you add new tables or change a data type), Fivetran automatically detects and writes these changes into your destination.
NOTE: Snowflake includes built-in system databases that store important metadata and account configuration details. These databases are part of the system by default and do not contain your actual business data. To optimize performance and storage, we do not sync this data.
Schema information
Fivetran tries to replicate the exact schema, tables, and views from your Snowflake source database to your destination.
We name the schema using the Destination schema prefix name you provide in the connector setup form. We replicate the source's tables and views to the destination schema. We name the destination schema with the connector name appended by the dataset name. For example, if the connector name is snowflake_db
and the dataset name is fivetran_schema
, the destination schema name is snowflake_db_fivetran_schema
.
Type transformations and mapping
As we extract your data, we match the Snowflake data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.
In Time Travel Sync, we support all primary keys whose data types are supported in regular columns. However, for Fivetran Teleport Sync, if we don't support the data type of the primary key, we treat the table as a primary key-less table.
The following table illustrates how we transform your Snowflake data types into Fivetran-supported types for regular and primary key columns:
Data Type | Fivetran Type | Regular Column Support | Primary Key Support (Teleport) |
---|---|---|---|
NUMBER* (s = 0) | One of the following types: - SHORT (p <= 4) - INTEGER (p <= 9) - LONG (p <= 18) | Yes | Yes |
NUMBER* (s > 0 OR p > 18) | BIGDECIMAL | Yes | No |
FLOAT | BIGDECIMAL | Yes | No |
TEXT (with ASCII only) | STRING | Yes | Yes (Up to a length of 256 bytes) |
TEXT (with Unicode) | STRING | Yes | Yes (Up to a length of 84 characters) |
TEXT | STRING | Yes | Yes (Up to a length of 256 bytes) |
BOOLEAN | BOOLEAN | Yes | No |
TIME | STRING | Yes | No |
TIMESTAMP_NTZ | LOCALDATETIME | Yes | No |
TIMESTAMP_TZ | INSTANT | Yes | No |
TIMESTAMP_LTZ | INSTANT | Yes | No |
DATE | LOCALDATE | Yes | No |
BINARY | BINARY | Yes | Yes (Length must be a multiple of 4 up to a max of 128 bytes and values must not contain leading zeros) |
VARIANT | STRING | Yes | No |
ARRAY | STRING | Yes | No |
OBJECT | JSON | Yes | No |
GEOGRAPHY | JSON | Yes | No |
GEOMETRY | JSON | Yes | No |
NOTE: The NUMBER data type has a precision
p
and a scales
. The maximum supported values arep = 38
ands = 37
. If we detect a precision or scale that exceeds those maximums, we map that data to the STRING data type and update the destination column type to STRING.
Updating data
We perform incremental updates of any new or modified data from your source database to capture only the data that has changed since our last sync. Depending on the permissions you have on the database, we use one of the following incremental sync methods:
- Time Travel Sync
- Fivetran Teleport Sync
NOTE: By default, we use Fivetran Teleport Sync to for incremental updates. However, we recommend that you use Time Travel Sync as your incremental sync method.
Time Travel Sync
Time Travel Sync uses the Change Data Capture (CDC) process to sync only the data that has changed since our last sync.
Time Travel Sync performs the following operations on your database:
- Captures only the changes in the data as they occur
- Leverage Snowflake Time Travel to access historical data
Fivetran Teleport Sync
Fivetran Teleport Sync queries perform the following operations on your database:
- Perform a full table scan of each synced table
- Perform calculations on all values in each synced table's rows
- Aggregate a compressed table snapshot in the database's memory
For optimum Fivetran Teleport Sync performance, we recommend that you make the following resources available in your database:
- 1 GB Free RAM
- 1 Free CPU Core
- IOPS (Teleport Sync times decrease linearly with an increase of available IOPS).
NOTE: Teleport captures deletes differently than log-based replication mechanisms. Learn more in our Changes to data between syncs documentation.
Automatic table re-syncs
For tables with a primary key, we support ADD COLUMN DDL operations with null or default static values. All other table or column alterations trigger an automatic table re-sync. For tables without a primary key, any DDL operation triggers an automatic table re-sync.
Tables with a primary key
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted
=FALSE
. - A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted
=TRUE
. - An UPDATE in the source table updates the data in the corresponding row in the destination.
If any primary key column has an unsupported data type, then the table is automatically treated as primary key-less (which means create table
permissions are required on the database or the workspace database).
Tables without a primary key
For tables without a primary key, we designate the _fivetran_id
column as the primary key. We populate this column with the hash of the values of each row.
We handle changes to tables without a primary key differently:
- An INSERT in the source table generates a new row in the destination with
_fivetran_deleted = FALSE
. - The
_fivetran_id
column helps us handle DELETE operations:- If there is a row in the destination that has a corresponding
_fivetran_id
value, that row will be updated with_fivetran_deleted = TRUE
. - If there is not a row in the destination that has a corresponding
_fivetran_id
value, then no changes are done in the destination.
- If there is a row in the destination that has a corresponding
- An UPDATE in the source table is treated as a DELETE followed by an INSERT, so it results in two rows in the destination:
- A row containing the old values with
_fivetran_deleted = TRUE
- A row containing the new values with
_fivetran_deleted = FALSE
- A row containing the old values with
As a result, one record in your source database may have several corresponding rows in your destination. For example, suppose you have a products
table in your source database with no primary key:
description | quantity |
---|---|
Shrink-ray gun | 1 |
Boogie robot | 2 |
Cookie robot | 3 |
You load this table into your destination during your initial sync, creating this destination table:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 1 | FALSE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
You then update a row:
UPDATE products SET quantity = 4 WHERE description = 'Cookie robot';
After your UPDATE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:00' | 3 | TRUE | dfdf |
Boogie robot | 3 | '2000-01-01 00:00:00' | 2 | FALSE | ewra |
Cookie robot | 4 | '2000-01-01 00:00:00' | 4 | FALSE | zxfd |
You then delete a row:
DELETE FROM products WHERE description = 'Boogie robot';
After your DELETE operation, your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Cookie robot | 2 | '2000-01-01 00:00:02' | 3 | TRUE | dfdf |
Cookie robot | 4 | '2000-01-01 00:00:02' | 4 | FALSE | zxfd |
Boogie robot | 3 | '2000-01-01 00:00:02' | 5 | TRUE | ewra |
So, while there may be just one record in your source database where description = Cookie robot
, there are two in your destination - an old version where _fivetran_deleted = TRUE
, and a new version where _fivetran_deleted = FALSE
.
We also de-duplicate* rows before we load them into your destination. We use the _fivetran_id
field, which is the hash of the non-Fivetran values in every row, to avoid creating multiple rows with identical contents. If, for example, you have the following table in your source:
*NOTE: Fivetran Teleport Sync does not support de-duplication for tables without primary keys. Learn more in our Fivetran Teleport Sync documentation.
description | quantity |
---|---|
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Shrink-ray gun | 1 |
Then your destination table will look like this:
description | quantity | _fivetran_synced | _fivetran_index | _fivetran_deleted | _fivetran_id |
---|---|---|---|---|---|
Shrink-ray gun | 1 | '2000-01-01 00:00:00' | 0 | FALSE | asdf |
Recommendations for Fivetran Teleport Sync
We recommend the following:
- If you want to sync tables with primary keys, we recommend that the combined size of all the tables does not exceed 100 GB.
- If you want to sync tables without primary keys, we recommend that the combined size of all the tables does not exceed 10 GB. Contact our support team for assistance with your syncs.
Deleted columns
If you delete a column in a source table, we replace all values in the deleted column with NULL values in the destination table.
Workspace for temporary tables
We create temporary tables in the source database to sync source tables without a primary key. By default, we create these tables in the same database and schema as the source table that we're syncing.
You can opt to create these temporary tables in a different location by using the Use working database for queries toggle in the connector setup form and then specifying the database and schema names.