Snowflake
Snowflake's architecture separates storage from computing, allowing customers to run Fivetran in a separate logical data warehouse. You can have multiple logical data warehouses running queries on the same underlying data. You can then provision one logical data warehouse as a destination for Fivetran to load new data into and another for your analysts to query the data.
NOTE: Fivetran supports Snowflake as both a database connector and a destination.
You may have several data sources, each of which has its own complex schema. We sync the entire schema of every data source to Snowflake, automatically and continuously. You can connect all your data sources, which we'll sync several times a day, without worrying about bogging down analysts' queries.
A Snowflake account can have multiple compute warehouses and multiple databases. If you want to sync data into two databases in a single Snowflake account, and/or use multiple compute warehouses to do so, you will have to create two Fivetran destinations; one for each warehouse or database connection.
Setup guide
Follow our step-by-step Snowflake setup guide to connect your Snowflake data warehouse with Fivetran.
Type transformation and mapping
The data types in your Snowflake data warehouse follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type | Notes |
---|---|---|
BOOLEAN | BOOLEAN | |
SHORT | NUMBER(38,0) | |
INT | NUMBER(38,0) | |
LONG | NUMBER(38,0) | |
BIGDECIMAL | NUMBER(precision,scale) | |
FLOAT | FLOAT | |
DOUBLE | FLOAT | |
LOCALDATE | DATE | |
LOCALDATETIME | TIMESTAMP_NTZ | |
INSTANT | TIMESTAMP_TZ | |
STRING | VARCHAR or TEXT | VARCHAR if bytelength is present, else TEXT. |
XML | VARIANT | |
JSON | VARIANT | |
JSONB | VARIANT | |
BINARY | BINARY |
NOTE: The maximum allowed length for VARIANT and VARCHAR/TEXT columns is 16 MB. We truncate values in these columns if they exceed the maximum length.
Parameter Lengths for Textual Types
Data type parameters (the x
in varchar(x)
) can differ between sources and destinations for textual types. This is because a source may consider this parameter to represent the number of characters in a piece of text data, and the destination may consider it to mean the number of bytes in that piece of data. For example, non-Latin characters are often represented using UTF-8 encoding with one to four one-byte code units. The character $
requires one byte, the character £
requires two bytes, and the character €
requires three bytes to encode in UTF-8. Thus, if your source considers each one a single character, you could parameterize the value $£€
with a length of 3, but if your destination considers each by its constituent bytes, it would need to be parameterized with a value of 6 (1 for $
, 2 for £
, 3 for €
). Therefore, if parameter lengths differ between your source and destination for text types, it is not necessarily indicative of a problem. Furthermore, in Snowflake, there is no difference in either performance or storage between a varchar(16)
and varchar(1024)
column if every value is under 16 characters long.
Flatten and query JSON data
To flatten and query JSON data, do the following:
Convert strings containing JSON to a VARIANT type using Snowflake's Parse_JSON function.
Use Snowflake's FLATTEN function on the VARIANT data.
Use Snowflake's SQL Extensions to query the semi-structured data.
NOTE: You can also use Fivetran's native Transformations to run the functions automatically against your JSON data and save the results in a new schema.
Table identifiers and aliases
Snowflake does not allow ANSI reserved keywords to be used as table names and aliases in a FROM
clause.
When you query your Snowflake data warehouse, be sure to encase the table names and aliases in double quotes to avoid exception errors. For example, use "JOIN" instead of JOIN.
Column identifiers
Snowflake does not allow the following ANSI reserved keywords to be column names when they are capitalized:
CONSTRAINT
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_USER
LOCALTIME
LOCALTIMESTAMP
To ensure a valid column name, Fivetran converts the keyword to lowercase and encases it in double quotes (for example, LOCALTIME
becomes "localtime"
). When you query these columns, be sure to use the converted column name.
Migrate databases
To migrate your Snowflake database that is configured with Fivetran to another database, perform the following steps:
Create variables for the role and database. Run the following commands:
set role_name = 'FIVETRAN_ROLE'; set database_name = 'FIVETRAN_DATABASE';
Grant the Fivetran role appropriate permissions on the new database. Run the following command:
GRANT ALL PRIVILEGES ON DATABASE {$database_name} TO ROLE {$role_name};
NOTE: See our setup guide for more information.
Pause all your connectors from the Fivetran dashboard. On the connector dashboard, set the connector status toggle to PAUSED.
Copy or move the schemas you want to sync from the existing Snowflake database to the new database.
Grant the
FIVETRAN_ROLE
the correct permissions on the database schemas and tables. Run the following commands:set schema_name = <customer_schema_name>; GRANT OWNERSHIP on SCHEMA {$schema_name} to FIVETRAN_ROLE REVOKE CURRENT GRANTS;` GRANT OWNERSHIP on ALL TABLES IN SCHEMA {$schema_name} to FIVETRAN_ROLE REVOKE CURRENT GRANTS; GRANT ALL on ALL TABLES IN {$schema_name} to FIVETRAN_ROLE;
Update the destination connection settings to specify the new database details. In the destination dashboard, click Edit connection details. Enter the new database name and provide the authentication details.
Create variables for the user and warehouse. Run the following commands:
set user_name = 'FIVETRAN_USER'; set warehouse_name = 'FIVETRAN_WAREHOUSE';
Change the default warehouse. Run the following command:
ALTER USER {$username}; SET default_warehouse = {$warehouse_name};
Enable your connectors. On the connector dashboard, set the connector status toggle to ENABLED.
Data load costs
Fivetran uses SQL queries to merge data into the tables in your data warehouse. Because Snowflake bills for compute usage on a per-second basis, they charge you when Fivetran loads data into your data warehouse. To reduce costs, we recommend that you provision an X-Small data warehouse with one-minute auto-suspend for Fivetran to load new data into. Most Fivetran customers will be billed for 10 - 20% usage of an X-Small warehouse, though more frequent syncs will increase the usage.
Reduce Fail-safe cost
Registering Fivetran with a transient database will ensure all Fivetran-created tables are transient by default. This will help reduce the additional costs and storage space associated with Fail-safe. However, transient tables cannot be recovered in case of operational or system failures.
See Snowflake's Fail-safe documentation for more information.
Improve query performance
Add cluster keys
Fivetran doesn’t enforce cluster keys in the Snowflake tables because the data is well clustered in most situations. You can define a clustering key for a table if the table size has increased or you observe slow query performance.
To improve the performance of the Fivetran-triggered queries, such as MERGE
, UPDATE
, DELETE
, add cluster keys on the columns used to join the existing and the incoming data. See Snowflake's Clustering documentation for more information on how to add clustering keys and understand the cost implications.
Adding cluster keys helps reduce the amount of data scanned for each query. For optimal performance, add cluster keys for the columns that are part of the table's primary keys. You can find the columns from the table’s schema in Snowflake or in log events synced to your destination by either our free Fivetran Platform Connector, which we automatically add to every destination you create, or external log services.
Collation
Fivetran does not recommend using custom collations as it can impact data integrity. You can change the collation using {<col_name>} COLLATE {<collation_of_your_choice>}
in your downstream queries while sorting or comparing your data. For example:
SELECT {<col_name>} FROM {<fivetran_replicated_table>}
ORDER BY {<col_name>} COLLATE {<collation_of_your_choice>};
NOTE: If you choose to use a collation, make sure the collation is case-sensitive.