Duplicate Data in the Destination
Issue
Some data is duplicated in the destination.
Environment
All destinations
Resolution
To resolve this issue, drop and re-sync the relevant table. This ensures that any records deleted in the source are also removed from the destination. To learn how to re-sync tables, see How to Trigger Historical Table Re-Syncs.
If you prefer not to re-sync the entire table, you can remove duplicates using the following commands:
Replace placeholder values, such as <table_name>, with the relevant values for your environment.
- Create a copy of the table and assign a row number for each primary key value:
CREATE TABLE <table_copy> AS SELECT *, row_number() OVER ( PARTITION BY <partition_columns> ORDER BY _fivetran_synced DESC ) AS _fivetran_row_num FROM <table_name>; - Delete duplicate rows from the temporary table:
DELETE FROM <table_copy> WHERE _fivetran_row_num <> 1; - Remove the helper column:
ALTER TABLE <table_copy> DROP COLUMN _fivetran_row_num; - Drop the original table, which still contains duplicates:
DROP TABLE <table_name>; - Rename the cleaned table to the original table name:
ALTER TABLE <table_copy> RENAME TO <table_name>;
Cause
This issue may occur when a row access policy or data masking policy is applied to a primary key column in your destination. If the policy hides or alters primary key values from the Fivetran user account, we can't match incoming source records to existing rows. As a result, we insert new rows instead of updating existing ones.