How to Fix Duplicate Data in Your Destination
Question
Why is there duplicate data in my destination after I sync my Amazon S3 connector?
Environment
Connector: Amazon S3 with the Modified File Merge value set to append_file
.
Answer
If you have append_file
selected, making updates to the same file results in duplicate data in the destination.
To fix this issue, do the following:
In your Fivetran dashboard, go to your S3 connector page.
Go to the Setup tab.
Click Edit Connection Details.
Set the Modified File Merge value to
upsert_file
.
Cause
The upsert_file
option replaces the records in your destination, using the filename and line number as the primary key. The append_file
option appends records if the file has been modified since your last sync.
If append_file
is selected and you upload the same file with a few modifications, Fivetran appends the records from the new file to those from the old file, resulting in duplicate records.
If you need to modify a file and re-upload it to your S3 bucket, your connector should use the upsert_file
option to ensure that duplicate records are not created in the destination.