Fivetran captures deletes whenever we can detect them so that you can run analyses on data that may no longer exist in your source system. Some sources provide us with direct information about deletes. When sources don’t provide us with direct information about deletes, we try to infer them.
When you delete data in the source, Fivetran soft deletes it in the destination. We add an extra column named
_fivetran_deleted to the table and mark the rows deleted in the source
TRUE. In other words, when you delete data in the source, Fivetran marks it
_fivetran_deleted = TRUE in the destination. The precise mechanism by which we capture deletes varies by connector type.
We can detect and capture deletes for most databases because we perform log-based replication (and logs contain deletes) for most databases. The major exception is Postgres XMIN based replication, which is non-log-based. Postgres WAL replication however does capture deletes.
Some application APIs provide dedicated endpoints that return deletes, and we capture deletes for those applications. However, most application APIs don’t return deletes as changes, so we cannot detect and therefore don’t capture those deletes. That means there may be data in your destination from those applications which has been deleted in the source.
Inferred deletes through re-syncs
If Fivetran re-syncs an entire table, we assume that any record that wasn’t updated must have been deleted, and mark it
_fivetran_deleted = TRUE in the destination. In the case of connectors where re-syncing doesn't take long, and we have identified deletes would provide analytical value, we periodically re-sync the entire connector and use this data to infer deletes.
We re-sync in the evenings and on weekends to minimize performance impact. The drawback of this method is that all deletes from a particular period are rolled together -- we don’t know exactly when data was deleted, only that it was deleted in the interval between updates. However, if a source doesn’t provide us direct information about deletes, inferred deletes are the next best thing.
As part of complete schema replication, Fivetran replicates custom data whenever it exists and is accessible. Not all sources that have custom data expose it in a way we can access.
Custom data includes custom objects, tables, and fields that you have configured in the source system to better suit your business needs. Custom objects are specific to your source, for example, custom Salesforce objects that match your business process. Custom tables are database tables that allow you to store information unique to your organization. Standard tables can also have custom fields, which are specific to an organization.
There is no special action you need to take to make sure we replicate your custom data. It will happen automatically for the systems that enable it.
Data blocking lets you omit certain tables or columns from replicating to your destination. Table blocking and column blocking are only available for some of our connectors. If either is available for your connector, you will see the option in your Schema tab. For connectors which support the feature, you can choose which tables or fields within a table to sync and which to block from syncing.
Data blocking avoids exposing sensitive information, like Personally Identifiable Information (PII), in your destination. It also saves you time and storage space, because Fivetran only syncs relevant data to your destination.
Read our detailed documentation about column blocking.
Column hashing anonymizes sensitive data in your destination without sacrificing its analytical value. Column hashing is only available for some of our connectors. If this feature is available for your connector, you will see the option in your Schema tab. For connectors that support the feature, you can select which column(s) within a table to hash.
Column hashing avoids exposing sensitive information, like PII, in your destination. Because hashing anonymizes your PII, this feature makes you compliant with the General Data Protection Regulation (GDPR).
Read our detailed documentation about column hashing.
Sometimes the data in your destination and your source get out of sync and you need to overwrite all or some of the existing data in your destination to make it consistent with the source. Depending on the connector, you can re-sync all or part of your data from your source to your destination.
A full re-sync completely overwrites the data in your destination with new data from your source. While it’s a powerful option for fixing a destination out of sync with its source, a full re-sync also pauses incremental updates. A full re-sync can take a hours to days for some connectors, either because of the amount of data or because of limits imposed by the source. You can initiate a full re-sync from your connector details page.
Fivetran can re-sync data from any source, but we have disabled the option to run a full re-sync for Marketo because re-syncing it takes an extremely long time. If you want to run a full re-sync for Marketo, contact our support team.
Table re-sync lets you overwrite the data in a specific table, so you can fix data integrity issues in selected tables without re-syncing the entire connector. Just like a full re-sync, re-syncing a table pauses incremental updates until the sync completes. However, table re-sync is much faster since single tables tend to have far less data.
In cases where Fivetran has identified that the history of an object has analytical value, we retain that history in the destination using the type 2 slowly changing dimension format. You can use that historical data to perform analyses on how your data changed over time.
By default, when Fivetran detects a change in the data, we update or insert (upsert) into the destination, overwriting old data. For objects that are tracking history, we upsert using a composite primary key composed of the natural primary key(s) and a modified timestamp. As a result, each new version of a record is added as a new row without creating duplicates of the same version.
API Configurable Beta
The Fivetran REST API provides several endpoints that let you perform key management actions which were previously available only through the dashboard.
- The User Management API lets you list, invite, edit, and delete your users.
- The Group Management API lets you understand the groups and connectors within those groups.
- The Connector Management API lets you create, edit, and manage a subset of your connectors.
Read our detailed documentation about the Fivetran REST API.
Priority-first syncs fetch your most recent data first so that it's quickly ready for you to use.
During your initial sync, Fivetran updates your destination with your most recent data. How many days of recent data we fetch and which tables we sync on a priority-first basis vary by connector.
In each subsequent sync, we continue to update your most recent data first as part of a forward sync. We then fetch your historical data in small increments as part of a backward sync. The backward sync duration varies by connector.
The forward and backward syncs are two different steps. After we complete the forward sync, we reschedule the connector with a notification on the Fivetran dashboard. The backward sync occurs immediately after we complete the forward sync. This allows us to push the data fetched during the forward sync into the destination, before initiating the backward sync.
Priority-first sync is supported for the following connectors :
|Connector Name||Priority Fetch Period||Tables that sync on a priority-first basis||Backward Sync Duration|
|Amplitude||15 days||All tables||6 hours|
|Google Analytics 360||15 days||All tables||6 hours|
|Help Scout||15 days||
|Marketo version 2 and earlier||7 days||All activity tables||6 hours|
|Marketo version 3 and later||7 days||All activity tables||6 hours|
|Salesforce Marketing Cloud||14 days||
Syncing empty tables and columns
Fivetran can create empty tables and columns in your destination for the following connectors:
- Facebook Ad Insights
- Google Ads
- Microsoft Dynamics 365
- Netsuite SuiteAnalytics
- SQL Server
Syncing empty tables and columns ensures that any templated SQL queries in your destination that reference these objects continue to work.
Note: Under no circumstances will we sync an empty schema.