History Mode BETAlink
Fivetran's history mode records every version of each record in your destination. With history mode, you can analyze data from a particular point in time or analyze how data has changed over time.
History mode is activated on a per-table basis. You can decide which tables have analytically valuable historical information and keep only those. You can switch a table back to live mode if you no longer want to collect historical data from that table.
Supported connectorslink
- Salesforce BETA
- Airtable PRIVATE PREVIEW
- HubSpot PRIVATE PREVIEW
- MongoDB PRIVATE PREVIEW
- MySQL PRIVATE PREVIEW
- PostgreSQL PRIVATE PREVIEW
- SQL Server PRIVATE PREVIEW
- Zendesk Sunshine PRIVATE PREVIEW
Sync overviewlink
It’s easiest to understand history mode in contrast with live mode. In live mode, when a record is updated in your source, Fivetran updates the corresponding record that exists in your data warehouse. Likewise, when a record is deleted in your source, we delete the corresponding record in the warehouse.
Once you activate history mode for a table, Fivetran keeps all versions of the record using the type 2 slowly changing dimension format. To keep all versions of the record, we add three columns to each record in that table. The next section examines these columns and their functions in detail.
Note: There is a third sync mode, legacy mode, which operates as a hybrid. Once you activate history mode for a table, you can’t return to legacy mode. You can only switch to live mode. In legacy mode, when a record is deleted in your source, we soft-delete it in the warehouse by setting the system column
fivetran_deleted
toTRUE
. Once you activate history mode for a table, we delete thefivetran_deleted
column and add a set of columns that collect more detailed information about the previous versions of your records. If you would rather not keep all of those deleted records, you can switch to live mode directly from legacy mode.
System columns for records in tables with history modelink
Column | Type | Description |
---|---|---|
_fivetran_active |
BOOLEAN | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE . |
_fivetran_start |
DATETIME | The time when the record was first created or modified in the source, based on a timestamp value in the source table that monotonically increases over time with data change or update. |
_fivetran_end |
DATETIME | The time until which the record was active minus epsilon, where epsilon is the smallest time difference that can be stored in the timestamp type value. The value for this column depends on the corresponding record's status - whether it is active or not - and the warehouse type. If the record is active (_fivetran_active = TRUE ), and the warehouse type is MySQL, then the value will be 2038-01-19T03:14:07.999Z . For all other warehouses, the value will be 9999-12-31T23:59:59.999Z . If the record is not active (_fivetran_active = FALSE ), then the _fivetran_end value will equal the _fivetran_start value of the next record with the same primary key minus one millisecond. If the record is deleted, then the value will be the delete timestamp - the time when the record was deleted - minus one millisecond. |
History mode example using mock datalink
The following example illustrates how Fivetran’s history mode handles data changes in the source and writes those changes in the warehouse.
Note: The bold text shows the change in the data after each sync cycle completes.
Suppose the following data is present in the source:
Id (PK) | counter | timestamp_col |
---|---|---|
a | 10 | 2019-06-04T10:58:35Z |
b | 20 | 2019-06-04T10:58:35Z |
After Fivetran syncs from the source to the warehouse, the warehouse will contain the following records:
Id | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2019-06-04T10:58:35Z | Tmax | TRUE | 2019-06-04T10:58:35Z |
b | 20 | 2019-06-04T10:58:35Z | Tmax | TRUE | 2019-06-04T10:58:36Z |
Now, let’s assume we receive the following records from the source in the next sync:
Id (PK) | counter | timestamp_col |
---|---|---|
a | 30 | 2019-06-05T10:58:35Z |
b | 20 | 2019-06-04T10:58:35Z |
c | 40 | 2019-06-05T10:58:35Z |
After Fivetran syncs from the source to the warehouse, the warehouse will contain the following records:
Id | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2019-06-04T10:58:35Z | 2019-06-05T10:58:34.999Z | FALSE | 2019-06-04T10:58:35Z |
a | 30 | 2019-06-05T10:58:35Z | Tmax | TRUE |
2019-06-05T20:58:35Z |
b | 20 | 2019-06-04T10:58:35Z | Tmax | TRUE |
2019-06-04T10:58:36Z |
c | 40 | 2019-06-05T10:58:35Z | Tmax | TRUE | 2019-06-05T20:58:36Z |
Now, let’s assume the record with id = b
is deleted at time 2019-06-06T10:58:35Z
and we receive that change in the next sync cycle. Then the warehouse will contain the following values:
Id | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2019-06-04T10:58:35Z | 2019-06-05T10:58:34.999Z | FALSE |
2019-06-04T10:58:35Z |
a | 30 | 2019-06-05T10:58:35Z | Tmax | TRUE |
2019-06-05T20:58:35Z |
b | 20 | 2019-06-04T10:58:35Z | 2019-06-06T10:58:35Z | FALSE | 2019-06-04T10:58:36Z |
c | 40 | 2019-06-05T10:58:35Z | Tmax | TRUE |
2019-06-05T20:58:36Z |
Switching modeslink
We detect changes to the sync mode instantly and start the migration. If you switch modes while your sync is running, we pause that sync. Then, we run the migration queries. After we have run the migration queries, we restart the sync.
Switching from live to history mode
Note: We check the
_fivetran_deleted
column to determine whether a record is soft-deleted or active. In the case of Salesforce data, we check if either of the_fivetran_deleted
or_is_deleted
column exists. If one of them exists, we use it to determine whether a record is soft-deleted. If both columns are present, we only consider_fivetran_deleted
. For all connectors, if any of delete-indicating columns doesn not exist, we assume that all the records present in the warehouse are active records. To update the previously synced data when switching from live mode to history mode, we run a series of migration queries to make the following changes:
-
Fivetran adds three history mode system columns:
_fivetran_start
,_fivetran_end
, and_fivetran_active
. -
If the record is active (the delete column's value is
false
ornull
or the delete column is not present at all), then we:- update
_fivetran_start
to the current warehouse timestamp value when the migration runs; - update
_fivetran_end
to the the warehouse max timestamp value -2038-01-19T03:14:07.999Z
for MySQL version 5.7 and above and2038-01-19T03:14:07Z
for MySQL version 5.6 and below,9999-12-31T23:59:59.999Z
for other warehouses; - update
_fivetran_active
toTRUE
.
- update
-
If the record is soft-deleted (the delete column's value is
TRUE
) then we:- update
_fivetran_start
and_fivetran_end
to the warehouse min value -1970-01-01T00:00:01Z
for MySQL,01-01-01T00:00:00Z
for other warehouses; - update
_fivetran_active
toFALSE
.
- update
-
We generate an optional cleanup task to delete the
_fivetran_deleted
column if it is present.
Note: These steps are not required for newly-created connectors.
Switching from history to live mode
To update the previously synced data when switching back from history to live mode, we perform the following steps ordered as listed:
-
We delete all the rows from the source where
_fivetran_active = FALSE
. -
We drop
_fivetran_start
,_fivetran_end
, and_fivetran_active
.
NOTE: When you switch a table from history mode to live mode, we delete all the historical versions of your records in that table. Make sure you definitely don’t care about the old versions of your records before you switch a table to live mode. In the case of Salesforce, we keep the last version of the record irrespectively of its state. In other words, we keep the last inserted record for Salesforce no matter if
_fivetran_active
isTRUE
orFALSE
.
Limitationslink
Unsupported tableslink
To keep the history in the data warehouse, Fivetran relies on a timestamp value that monotonically increases over time with the data change or update. So if a table lacks that timestamp column, we don’t support history mode for that table. Fivetran also doesn’t support history mode for re-imported tables. That means if we are syncing the entire table during each sync, then we don’t support history mode for that table.
History mode support for re-imported tables is coming soon.
Changes to data between syncslink
If a source lacks logs that capture data changes between the syncs, then we can only capture the data changes during the syncs. That means that if a field changes several times between the syncs we will not be able to capture those in-between states. For example, suppose the sync runs once a day at midnight. At midnight, the value of the field foo
is 10
, and the Fivetran sync captures it. At 6 AM, the value changes to 15
. Then, at 11 PM, the value changes to 20
. At midnight the next day, the Fivetran sync runs and notes the value of foo
was updated to 20
at 11 PM. The in-between state where foo
was updated to 15
at 6 AM is not captured.
Consider another example. Suppose a sync runs once a day at midnight. Then, at 8 AM a record is created. Then, at 10 AM the same record is deleted. When the next sync runs at midnight, the existence of this record will not be recorded.
Salesforce doesn't provide logs, so it is affected by this limitation. To mitigate the limitation, run the connector more frequently to capture data with a higher fidelity. Many other sources capture a record of changes between syncs and will not have this limitation.
Identical _fivetran_start
and _fivetran_end
valueslink
If, while fetching data, we see a new record upsert followed be a delete for that same record with the same timestamp, we will:
- assign the same timestamp value to
_fivetran_start
and_fivetran_end
, and - set
_fivetran_active
toFALSE
.
Record historylink
We can only guarantee the record history from the moment a table is switched to history mode. For example, if you switch a table to history mode at 2020-08-01T00:00:00Z
, then we will only be able to capture its history from that time on. That is why when we migrate a table to history mode, we initialize the _fivetran_start
value for active records as the warehouse CURRENT_TIMESTAMP
. For soft-deleted records, we initialize both _fivetran_start
and _fivetran_end
as the MIN_TIMESTAMP
available, because we have no way to determine when those records were active in the past.
Analysts should be aware of this limitation and not run historical queries for the time periods before the table was converted to history mode.