Snowflake's architecture separates storage from computing, allowing customers to run Fivetran in a separate logical data warehouse. You can have multiple logical data warehouses running queries on the same underlying data. You can then provision one logical data warehouse as a destination for Fivetran to load new data into and another for your analysts to query the data.
You may have several data sources, each of which has its own complex schema. We sync the entire schema of every data source to Snowflake, automatically and continuously. You can connect all your data sources, which we'll sync several times a day, without worrying about bogging down analysts' queries.
A Snowflake account can have multiple compute warehouses and multiple databases. If you'd like to sync data into two databases in a single Snowflake account, and/or use multiple compute warehouses to do so, you will have to create two Fivetran destinations; one for each warehouse or database connection.
- Snowflake Account
- Ability to create a user and warehouse for Fivetran
- If you have defined a Snowflake Network Policy, add Fivetran's IP to that policy
Run script in Snowflake warehouse
Shared vs. exclusive warehouse
If you would like Fivetran to use a shared warehouse to ingest data, in the next step change
the name of the shared warehouse. Otherwise, leave
fivetran_warehouse in the script below to create a new exclusive warehouse.
Using an exclusive warehouse means that Fivetran operations will never contend with your queries for resources,
but you will have to pay the cost of running
fivetran_warehouse all the time.
Since Fivetran loads data incrementally, it usually requires very little compute resources, so it may make sense to use a shared warehouse to reduce cost.
Copy the script given below. Log in to your Snowflake data warehouse. In a new workbook, run the script you copied.
You can prepare the script yourself. To do that, replace the default
with values that conform to your specific naming conventions for those resources.
RUN THE ENTIRE SCRIPT:
begin; -- create variables for user / password / role / warehouse / database (needs to be uppercase for objects) set role_name = 'FIVETRAN_ROLE'; set user_name = 'FIVETRAN_USER'; set user_password = 'password123'; set warehouse_name = 'FIVETRAN_WAREHOUSE'; set database_name = 'FIVETRAN_DATABASE'; -- change role to securityadmin for user / role steps use role securityadmin; -- create role for fivetran create role if not exists identifier($role_name); grant role identifier($role_name) to role SYSADMIN; -- create a user for fivetran create user if not exists identifier($user_name) password = $user_password default_role = $role_name default_warehouse = $warehouse_name; grant role identifier($role_name) to user identifier($user_name); -- change role to sysadmin for warehouse / database steps use role sysadmin; -- create a warehouse for fivetran create warehouse if not exists identifier($warehouse_name) warehouse_size = xsmall warehouse_type = standard auto_suspend = 60 auto_resume = true initially_suspended = true; -- create database for fivetran create database if not exists identifier($database_name); -- grant fivetran role access to warehouse grant all privileges on warehouse identifier($warehouse_name) to role identifier($role_name); -- grant fivetran access to database grant all privileges on database identifier($database_name) to role identifier($role_name); commit;
Connecting your warehouse
After selecting Snowflake on the welcome screen, fill in the required information in the UI about:
- Password / PrivateKey
- Database name
- Time zone
The username must contain 'fivetran'. You should not use this username for any other purpose.
If you want to authenticate using
KEY_PAIR, perform the following steps:
Use the command line in a terminal window to generate a private key. You can generate an encrypted version of the private key or an unencrypted version of the private key.
To generate an unencrypted version, use the following command:
openssl genrsa -out rsa_key.pem 2048
To generate an encrypted version, use the following command:
openssl genrsa 2048 | openssl pkcs8 -topk8 -v1 <ALGORITHM> -inform PEM -out rsa_key.p8
Various algorithms can be used with the -v1 command line option. These algorithms use the PKCS#12 password based encryption algorithm and allow strong encryption algorithms like triple DES or 128-bit RC2 to be used.
From the command line, generate the public key by referencing the private key:
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
Assign the public key to the Snowflake user:
alter user <USERNAME> set rsa_public_key='<PUBLIC_KEY>';
Type transformation and mapping
The data types in your Snowflake warehouse follow Fivetran's standard data type storage.
We use the following data type conversions:
|Fivetran Data Type||Destination Data Type||Notes|
|STRING||VARCHAR or TEXT||VARCHAR if
How to Flatten and Query JSON data
Next you can use Snowflake's SQL Extensions to query semi-structured (JSON) data.
Note: Table identifiers and aliases
Snowflake does not allow ANSI reserved keywords to be used as table names and aliases in a FROM clause.
When you query your Snowflake warehouse, be sure to encase the table names and aliases in double quotes (for example, use "JOIN" instead of JOIN) to avoid exception errors.
Note: Column Identifiers
Using Fivetran's native Transformations, you will be able to run the functions above automatically against your JSON data and save the results in a new schema.
Snowflake does not allow the following ANSI reserved keywords to be column names when they are capitalized:
To ensure a valid column name, Fivetran converts the keyword to lowercase and encases it in double quotes (for example,
"localtime"). When you query these columns, be sure to use the converted column name.
Note: Syncing binary data
Fivetran loads binary data into your Snowflake destination using the BASE64 encoded format. You need to set the value of the
BINARY_INPUT_FORMAT parameter to BASE64 for the
fivetran_user in the destination. Read Snowflake's documentation for more information.
Data load costs
Fivetran uses SQL queries to merge data into the tables in your warehouse. Because Snowflake bills for compute usage on a per-second basis, they charge you when Fivetran loads data into your warehouse. To reduce costs, we recommend that you provision an X-Small data warehouse with one-minute auto-suspend for Fivetran to load new data into. Most Fivetran customers will be billed for 10 - 20% usage of an X-Small warehouse, though more frequent syncs will increase the usage.
Note: Reduce Fail-safe cost
Registering Fivetran with a transient database will ensure all Fivetran-created tables are transient by default. This will help in reducing the additional costs and storage space associated with Fail-safe. However, transient tables cannot be recovered in case of operational or system failures.
See Snowflake's Fail-safe documentation for more information.