How to Handle Duplicate Records in Snowflake Destination?
Question
How can I identify, prevent, and clean up duplicate records in Snowflake destination tables managed by Fivetran?
Environment
Destination: Snowflake
Answer
Under normal conditions, Fivetran does not create duplicate records in destination tables. However, specific configurations or scenarios may result in duplicate rows.
This article explains:
- Common causes of duplicate records
- How to prevent duplicates
- How to safely clean up existing duplicates
Why do duplicate records occur?
Fivetran uses MERGE queries that join staging tables with destination tables using primary keys. Accurate JOIN operations are critical to ensure records are correctly inserted or updated. Some Snowflake configurations can interfere with these joins and result in duplicate records.
Example: When case-insensitive collation such as en-ci is enabled, primary key comparisons can behave inconsistently. Fivetran treats primary keys as case-sensitive, while Snowflake tables with en-ci collation treat values like ABC and abc as identical. This mismatch can cause multiple incoming records to match the same destination row, leading to duplicate records.
Common causes
- Data masking policies on primary keys: If a Snowflake data masking policy is applied to a primary key, Fivetran cannot recognize the true key values. As a result, incoming records are inserted instead of updated, creating duplicates. For more information, see Snowflake documentation.
- Row-level access policies: Row-level access policies can prevent Fivetran from accessing all rows in a table during a merge, which can also lead to duplicate inserts. For more information, see Snowflake documentation.
- Custom collation on primary keys: Applying custom collation, especially case-insensitive collation such as en-ci, can cause unexpected join behavior. Fivetran treats primary keys as case-sensitive, while Snowflake tables with case-insensitive collation treat values like
ABCandabcas the same. For more information, see Snowflake documentation. - Manual changes to Fivetran-managed tables: Manually inserting records into tables managed by Fivetran can create duplicates that Fivetran cannot reconcile.
- Primary key changes: Changing primary keys on an existing table without resyncing can result in duplicate records.
How do I prevent future duplicates?
Start by identifying which of the causes above apply to your environment, then take the appropriate action.
Remove conflicting Snowflake configurations
- Data masking policies: Remove masking policies from primary key columns, or ensure they are disabled for the Fivetran user role.
- Row-level access policies: Remove row-level access policies from Fivetran-managed tables, or ensure the Fivetran user has full access to all rows.
- Custom collation: Use the default case-sensitive collation for Fivetran-managed tables. If you need case-insensitive behavior, apply collation in queries or views instead of on the base tables.
To check the account-level collation:
SHOW PARAMETERS IN ACCOUNT LIKE 'DEFAULT_DDL_COLLATION';
If it is set to a case-insensitive value (such as en-ci), ask your Snowflake account administrator to unset it:
ALTER ACCOUNT UNSET DEFAULT_DDL_COLLATION;
Or set a case-sensitive collation at the database level:
ALTER DATABASE <database_name> SET DEFAULT_DDL_COLLATION = 'utf8';
Avoid unsupported changes
- Do not manually add or update records in Fivetran-managed tables.
- Do not modify primary keys on existing tables. If primary keys must change, drop and resync the table.
How do I clean up existing duplicate records?
After preventing future duplicates, follow these steps to remove existing duplicate rows.
Create copy table with row numbers
CREATE TABLE <table_copy> AS
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY <primary_key_columns>
ORDER BY _fivetran_synced DESC
) AS _fivetran_row_num
FROM <table_name>;
Delete duplicate rows
Retain only the most recently synced record for each primary key:
DELETE FROM <table_copy> WHERE _fivetran_row_num <> 1;
Remove helper column
ALTER TABLE <table_copy> DROP COLUMN _fivetran_row_num;
Replace original table
DROP TABLE <table_name>;
ALTER TABLE <table_copy> RENAME TO <table_name>;
How do I verify the cleanup?
Run the following query. It should return no results if all duplicates were removed successfully:
SELECT <primary_key_columns>, COUNT(*)
FROM <table_name>
GROUP BY <primary_key_columns>
HAVING COUNT(*) > 1;
After verification, resume or trigger a Fivetran sync.
Best practices
To avoid duplicate records in the future:
- Do not manually modify Fivetran-managed tables.
- Avoid applying masking, row-level access, or custom collation to primary key columns.
- Use default case-sensitive collation for Fivetran-managed objects.
- Drop and re-sync tables if primary keys must change.
- Use views to apply custom logic instead of altering base tables.