Snowflake Internal vs. External Staging
Question
Does Fivetran use internal or external staging for my Snowflake destination?
Environment
Destination: Snowflake
Answer
By default, Fivetran stages data using a Snowflake internal stage, also known as a user stage. In some situations, Fivetran uses external staging instead.
Fivetran uses external staging in the following situations:
- AWS PrivateLink connections: Snowflake internal staging is not supported when you use AWS PrivateLink to connect Fivetran to your Snowflake destination. For these connections, Fivetran uses Fivetran-managed buckets, such as Amazon S3 or Google Cloud Storage (GCS), to stage data.
- Google Cloud Private Service Connect connections: Snowflake internal staging is not supported for destinations that use Google Cloud Private Service Connect. For these connections, Fivetran uses external staging in GCS.
- Memory constraints or connectivity issues: Fivetran may fall back to external staging in these situations to ensure uninterrupted data loading.
If you use AWS PrivateLink or Google Cloud Private Service Connect and observe that your Snowflake destination is loading data from an external Amazon S3 or GCS bucket, this is expected behavior and not a misconfiguration.
Fivetran manages the staging process regardless of which staging method is used. We encrypt staged data, generate temporary credentials when applicable, and automatically delete staged files after the data is successfully loaded into Snowflake.
Context
In Snowflake documentation, they note that bulk load is the most efficient way to load large volumes of data. For the insert-only initial sync, Fivetran leverages staging to load data from files in the staging area directly into the target tables using COPY commands.
For updates following the initial sync, we process the changes using controlled routines. We use staging to bulk load incremental changes into staging tables, followed by MERGE statements to apply the changes to the target tables. Since the data already resides in Snowflake, the merges are very efficient, allowing for high throughput.