History Mode
History mode is a sync mode that we use to track history where the history of an object has analytical value. In history mode, Fivetran records every version of each record in the source table to the corresponding table in your destination. With history mode, you can analyze data from a particular point in time or analyze how data has changed over time.
Depending on the connector type, Fivetran uses different approaches to retain historical data:
- For connectors where Fivetran defines the schema, we track history for a predefined connector-specific set of tables. We select those tables based on the analytical value of their historical data. For these connectors, you cannot change which tables track history.
- For connectors where Fivetran just replicates the schema, we give you the option to select which tables track history. For these connectors, you can switch any supported table to history mode to track its history.
To see if your connector supports history mode, see the features table on your connector's documentation page.
For the tables with _history
suffix in the name, history mode is the only sync mode.
TIP: The
_history
suffix in a table’s name is a sign of a history mode table. However, note that not all history mode tables have this suffix. We recommend that you refer to the Features and Schema Information sections of the relevant connector's overview page to learn which tables synced by that connector are history mode tables.
NOTE: Enabling history mode may increase your monthly active rows (MAR) since every change is recorded as a new row that counts towards Monthly Active Rows (MAR).
Sync overview
Once you activate history mode for a table, Fivetran keeps all versions of the records in the table using the type 2 slowly changing dimension format. To keep all versions of the record, we add three columns to that table, with one of them replacing the original source timestamp column. The next section examines these columns and their functions in detail.
NOTE: For some connectors, we also use the relevant last modified timestamp column as a part of the composite key.
System columns added to tables with history mode
Column | Type | Description |
---|---|---|
_fivetran_active | BOOLEAN | TRUE if it is the currently active record. FALSE if it is a historical version of the record. Only one version of the record can be TRUE . |
_fivetran_start | DATETIME | The time when the record was first created or modified in the source, based on a timestamp value in the source table that monotonically increases over time with data change or update. |
_fivetran_end | DATETIME | The time until which the record was active minus epsilon, where epsilon is the smallest time difference that can be stored in the timestamp type value. The value for this column depends on the corresponding record's status - whether it is active or not - and the destination type. If the record is active (_fivetran_active = TRUE ), and the destination type is MySQL, then the value will be 2038-01-19T03:14:07.999Z . For all other destinations, the value will be 9999-12-31T23:59:59.999Z . If the record is not active (_fivetran_active = FALSE ), then the _fivetran_end value will equal the _fivetran_start value of the next record with the same primary key minus one millisecond. If the record is deleted, then the value will be the delete timestamp - the time when the record was deleted - minus one millisecond. |
NOTE: The
_fivetran_start
column replaces the original source timestamp column.
History mode example using mock data
The following example illustrates how Fivetran’s history mode handles data changes in the source and writes those changes in the destination.
NOTE: The bold text shows the change in the data after each sync cycle completes.
Suppose the following data is present in the source:
ID (PK) | counter | timestamp_col |
---|---|---|
a | 10 | 2020-01-01T00:00:00Z |
b | 20 | 2020-01-01T00:00:00Z |
After Fivetran syncs from the source to the destination, the destination will contain the following records:
ID | counter | _fivetran_start | _fivetran_end | _fivetran_active | _fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-01T00:00:00Z | Tmax | TRUE | 2020-01-01T01:00:00Z |
b | 20 | 2020-01-01T00:00:00Z | Tmax | TRUE | 2020-01-01T01:00:00Z |
Note we replaced the original source timestamp column timestamp_col
with _fivetran_start
. Now, let’s assume we receive the following records from the source in the next sync:
ID (PK) | counter | timestamp_col |
---|---|---|
a | 30 | 2020-01-02T00:00:00Z |
b | 20 | 2020-01-01T00:00:00Z |
c | 40 | 2020-01-02T00:00:00Z |
After Fivetran syncs from the source to the destination, the destination will contain the following records:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-01T00:00:00Z | 2020-01-01T23:59:59.999Z | FALSE | 2020-01-01T01:00:00Z |
a | 30 | 2020-01-02T00:00:00Z | Tmax | TRUE | 2020-01-02T01:00:00Z |
b | 20 | 2020-01-01T00:00:00Z | Tmax | TRUE | 2020-01-02T01:00:00Z |
c | 40 | 2020-01-02T00:00:00Z | Tmax | TRUE | 2020-01-02T01:00:00Z |
Now, let’s assume the record with id = b
is deleted at time 2020-01-03T00:00:00Z
and we receive that change in the next sync cycle. Then the destination will contain the following values:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-01T00:00:00Z | 2020-01-01T23:59:59.999Z | FALSE | 2020-01-01T01:00:00Z |
a | 30 | 2020-01-02T00:00:00Z | Tmax | TRUE | 2020-01-02T01:00:00Z |
b | 20 | 2020-01-01T00:00:00Z | 2020-01-03T00:00:00Z | FALSE | 2020-01-02T01:00:00Z |
c | 40 | 2020-01-02T00:00:00Z | Tmax | TRUE | 2020-01-02T01:00:00Z |
Switching modes
We detect changes to the sync mode instantly and start the migration. If you switch modes while your sync is running, we pause that sync. Then, we run the migration queries. After we have run the migration queries, we restart the sync.
Switching to history mode
NOTE: We check the
_fivetran_deleted
column to determine whether a record is soft deleted or active. For all tables, if any of delete-indicating columns does not exist, we assume that all the records present in the destination are active records.
To update the previously synced data when switching to history mode, we run a series of migration queries to make the following changes:
Fivetran adds three history mode system columns:
_fivetran_start
- which replaces the original source timestamp column_fivetran_end
_fivetran_active
If the record is active (the delete column's value is
false
ornull
or the delete column is not present at all), then we:- update
_fivetran_start
to the max_fivetran_synced
timestamp value present in the destination when the migration runs; - update
_fivetran_end
to the destination max timestamp value -2038-01-19T03:14:07.999Z
for MySQL version 5.7 and above and2038-01-19T03:14:07Z
for MySQL version 5.6 and below,9999-12-31T23:59:59.999Z
for other destinations; - update
_fivetran_active
toTRUE
.
- update
If the record is soft deleted (the delete column's value is
TRUE
) then we:- update
_fivetran_start
and_fivetran_end
to the destination min value -1970-01-01T00:00:01Z
for MySQL,01-01-01T00:00:00Z
for other destinations; - update
_fivetran_active
toFALSE
.
- update
After switching to the history mode, we delete the
_fivetran_deleted
column if it's present.NOTE: Change to the history mode occurs in the next sync cycle.
To switch back to soft delete mode, a re-sync of the table is required. Change the mode from the Schema tab of your connector, save the changes, and hover over the table you changed the mode for. Click the Resync button.
NOTE: These steps are not required for newly-created connectors.
Re-syncs in history mode
Re-syncs in history mode do not impact the previously tracked changes. When you trigger a re-sync, all previous records are marked as inactive, and the history starts again from the point of the re-sync:
At the start of a re-sync:
- Depending on the connector, Fivetran may mark every active row in the destination as
_fivetran_active
=FALSE
. Note that this is a connector-specific behavior.
- Depending on the connector, Fivetran may mark every active row in the destination as
During re-sync in history mode:
- Fivetran re-imports the current data and marks it as
_fivetran_active
=TRUE
- Fivetran re-imports the current data and marks it as
For example, for the following table:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-02T00:00:00Z | 2020-01-02T23:59:59.999Z | FALSE | 2020-01-02T01:00:00Z |
a | 30 | 2020-01-03T00:00:00Z | Tmax | TRUE | 2020-01-03T01:00:00Z |
After a re-sync at 2020-01-03T06:00:00Z, you get the following result if no changes were made to the source:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-02T00:00:00Z | 2020-01-02T23:59:59.999Z | FALSE | 2020-01-02T01:00:00Z |
a | 30 | 2020-01-03T00:00:00Z | Tmax | TRUE | 2020-01-03T06:00:00Z |
After a re-sync at 2020-01-03T06:00:00Z, you get the following result if changes were made to the source:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-02T00:00:00Z | 2020-01-02T23:59:59.999Z | FALSE | 2020-01-02T01:00:00Z |
a | 30 | 2020-01-03T00:00:00Z | 2020-01-03T01:59:59.999Z | FALSE | 2020-01-03T01:00:00Z |
a | 40 | 2020-01-03T02:00:00Z | Tmax | TRUE | 2020-01-03T06:00:00Z |
Exception
If the incoming _fivetran_start
value received from the source during the re-sync is smaller than the _fivetran_start
values of previous syncs, then we hard delete the previous records during the re-sync.
For example, if you re-sync the above table, and the incoming row with ID = a
has _fivetran_start = 2020-01-01T00:00:00Z
coming from the source, you would get the following result:
ID | counter | fivetran_start | fivetran_end | fivetran_active | fivetran_synced |
---|---|---|---|---|---|
a | 10 | 2020-01-01T00:00:00Z | Tmax | TRUE | 2020-01-05T01:00:00Z |
Limitations
Destinations
Apache Kafka does not support history mode.
The following destinations support history mode in Private Preview:
Unsupported tables
To keep the history in the destination, Fivetran relies on a timestamp value that monotonically increases over time with the data change or update. So if a table lacks that timestamp column, we don’t support history mode for that table. Fivetran also doesn’t support history mode for re-imported tables. That means if we are syncing the entire table during each sync, then we don’t support history mode for that table.
History mode works differently for database connectors. Since databases do not require a timestamp value, history mode uses the current import start time and the update time provided by that database's incremental sync method.
Changes to data between syncs
If a source (such as a database connector) has logs that capture change data changes between the syncs, then we capture all data changes that occurred between the syncs. For example, suppose the sync runs once a day at midnight. At midnight, the value of the field foo
is 10
, and the Fivetran sync captures it. At 6 a.m., the value changes to 15
. Then, at 11 p.m., the value changes to 20
. At midnight the next day, the Fivetran sync runs and notes the value of foo
was updated to 15
at 6AM and 20
at 11 p.m., creating two additional rows noting all of the value changes.
If a source lacks logs that capture data changes between the syncs, then we can only capture the data changes during the syncs. That means that if a field changes several times between the syncs we will not be able to capture those in-between states. For example, suppose the sync runs once a day at midnight. At midnight, the value of the field foo
is 10
, and the Fivetran sync captures it. At 6 a.m., the value changes to 15
. Then, at 11 p.m., the value changes to 20
. At midnight the next day, the Fivetran sync runs and notes the value of foo
was updated to 20
at 11 p.m. The in-between state where foo
was updated to 15
at 6 a.m. is not captured.
Consider another example. Suppose a sync runs once a day at midnight. Then, at 8 a.m. a record is created. Then, at 10 a.m. the same record is deleted. When the next sync runs at midnight, the existence of this record will not be recorded.
Salesforce doesn't provide logs, so it is affected by this limitation. To mitigate the limitation, run the connector more frequently to capture data with a higher fidelity. Many other sources capture a record of changes between syncs and will not have this limitation.
Identical _fivetran_start
and _fivetran_end
values
If, while fetching data, we see a new record upsert followed by a delete for that same record with the same timestamp, we will:
- assign the same timestamp value to
_fivetran_start
and_fivetran_end
, and - set
_fivetran_active
toFALSE
.
Record history
We can only guarantee the record history from the moment a table is switched to history mode. For example, if you switch a table to history mode at 2020-08-01T00:00:00Z
, then we will only be able to capture its history from that time on. That is why when we migrate a table to history mode, we initialize the _fivetran_start
value for active records as the destination CURRENT_TIMESTAMP
. For soft deleted records, we initialize both _fivetran_start
and _fivetran_end
as the MIN_TIMESTAMP
available, because we have no way to determine when those records were active in the past.
Analysts should be aware of this limitation and not run historical queries for the time periods before the table was converted to history mode.
PostgreSQL configuration requirements
You can only use history mode for your PostgreSQL connector if you use logical replication or Fivetran Teleport Sync as your incremental sync method.
SQL Server CT and CDC differences
For SQL Server, history mode functions slightly differently depending on which incremental sync method you choose: change tracking (CT) or change data capture (CDC).
Since CT doesn't capture every single change record in the source, you only see the most recent update for each sync. For example, if you have 10 changes over a period of 1 hour and you sync once an hour, you'll only see 10th change with history mode.
Since CDC captures a full change log of all records in the source, history mode with CDC functions in the same way as history mode for our other databases. For example, if you have 10 changes over a period of 1 hour and you sync once an hour, you'll see all 10 changes with history mode.
History mode with CDC provides greater data capture granularity than history mode with CT because it captures the data changes between syncs.
History mode and Fivetran Teleport Sync differences
History mode functions slightly differently depending if you choose Fivetran Teleport Sync as your incremental sync mechanism.
With most other database incremental sync methods, history mode captures all changes that occurred between syncs. The exception to this is SQL Server CT, as noted above.
With Fivetran Teleport Sync, history mode only captures changes that occurred during the most recent sync.