PostgreSQL
PostgreSQL is an open-source database. Fivetran's integration platform replicates data from your PostgreSQL source database and loads it into your destination.
NOTE: Fivetran supports PostgreSQL as both a database connector and a destination.
Fivetran offers the following solutions for replicating data from your PostgreSQL source database:
- Fivetran native connector
- HVR - PostgreSQL
This document is about the Fivetran's native connector for PostgreSQL.
Supported services
Fivetran supports the following PostgreSQL database services:
- Generic PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon Aurora Serverless V2
- Amazon RDS PostgreSQL
- Azure Database for PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
Supported configurations
Fivetran supports the following PostgreSQL configurations:
Supportability Category | Supported Values |
---|---|
Database versions | 9.4 - 16 |
Maximum row size | 400MB per row |
Connector limit per database | 3 |
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
Server encoding | UTF-8 |
Which PostgreSQL instance types we can connect to depends on whether you use logical replication, XMIN, or Fivetran Teleport Sync as your incremental sync method. Read our Updating data documentation for more information.
Instance Types | Logical Replication | XMIN | Fivetran Teleport Sync |
---|---|---|---|
Generic PostgreSQL | |||
Primary instance | check (version 10 or later) | check (version 10 or later) | check |
Standby instance | check (version 16 or later) | check (version 10 or later) | check |
Amazon Aurora PostgreSQL | |||
Primary instance | check | check | check |
Standby instance | check | check | |
Amazon Aurora Serverless V2 | |||
Primary instance | check | check | check |
Standby instance | check | check | |
Amazon RDS PostgreSQL | |||
Primary instance | check | check | check |
Standby instance | check (version 16 or later) | check | check |
Azure Database for PostgreSQL | |||
Primary instance | check | check | check |
Standby instance | check (version 16 or later) | check | check |
Google Cloud PostgreSQL | |||
Primary instance | check | check | check |
Standby instance | check (version 16 or later) | check | check |
Heroku PostgreSQL | |||
Primary instance | check (without extensions) | check | |
Standby instance | check (without extensions) | check |
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | check | |
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | |
Authorization via API | check |
*XMIN does not capture data that exists for an amount of time smaller than the sync interval.
Setup guide
In your primary database, you need to do the following:
- Allow access to your PostgreSQL database using Fivetran's IP
- Create a Fivetran-specific PostgreSQL user with read-level permissions
- (Optional) Allow access to a read-replica of your PostgreSQL database. Using a read-replica can help to avoid unnecessary strain on your primary database.
- (Logical replication Only) Allow access to a logical replication slot
For specific instructions on how to set up your database, see the guide for your PostgreSQL database type:
- Generic PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon RDS PostgreSQL
- Azure Database for PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
Sync overview
Once Fivetran is connected to your PostgreSQL database or read replica, we pull a full dump of all selected data from your database. Using either the Fivetran Teleport Sync, logical replication, or XMIN change data capture process, we pull all your new and changed data at regular intervals.
If data in the source changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination. For every schema in your PostgreSQL source database, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.
Syncing empty tables and columns
Fivetran will sync empty tables and columns for your PostgreSQL connector. We only sync empty columns to your destination once you make another data alteration to that table. Simply adding a new column will not be reflected in your destination. For more information, see our Features documentation.
Schema information
Fivetran tries to replicate the exact schema and tables from your PostgreSQL source database to your destination.
We name the source schema with the dataset name you provided in the connector setup form. We replicate the source's tables to the destination schema. We name the destination schema with the connector name appended by the dataset name. For example, if the connector name is postgres_db
and the dataset name is fivetran_schema
, the destination schema name is postgres_db_fivetran_schema
. For more information, see our naming conventions documentation.
NOTE: We do not sync foreign tables, views, or materialized views.
Fivetran-generated columns
Fivetran adds the following columns to every table that is added to your destination:
_fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row._fivetran_id
(STRING) is the hash of the non-Fivetran values of each row. It's a unique ID that Fivetran uses to identify rows in tables that do not have a primary key._fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source database.NOTE: If you use XMIN as your incremental sync method, your
_fivetran_deleted
column will have null values because XMIN does not track deleted rows.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformation and mapping
As we extract your data, we match PostgreSQL data types to types that Fivetran supports. If we don't support a certain data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns of data types that we don't accept or transform.
For primary keys, all supported data types are valid.
The following table illustrates how we transform your PostgreSQL data types into Fivetran supported types:
PostgreSQL Type | Fivetran Type | Fivetran Supported | Notes |
---|---|---|---|
ARRAY | JSON | True | |
BIGINT/ BIGSERIAL | LONG | True | |
BIT | BOOLEAN | True | |
BOOLEAN | BOOLEAN | True | |
BYTEA | BINARY | True | We only support hexadecimal format. We do not support escape format. |
CHARACTER VARYING | STRING | True | |
CHARACTER | STRING | True | |
CIDR | STRING | True | |
CITEXT | STRING | True | |
DATE | LOCALDATE | True | |
DOUBLE PRECISION | DOUBLE | True | |
GEOGRAPHY | (JSON, DOUBLE, DOUBLE) | True | For details, see the PostGIS geography data types section. For connectors created after 12/14/2020, the JSON data is GeoJSON compliant. |
GEOMETRY | (JSON, DOUBLE, DOUBLE) | True | For details, see the PostGIS geometry data types section. For connectors created after 12/14/2020, the JSON data is GeoJSON compliant. |
HSTORE | JSON | True | |
INTEGER/ SERIAL | INTEGER | True | |
INTERVAL | DOUBLE | True | |
JSON | JSON | True | We sync JSON values as is, preserving all escaping characters. Malformed JSON values are synced to the destination as wrapped JSON in the format {"fivetran_error":"This JSON value was malformed in the source","rawValue":"[ORIGINAL JSON VALUE]"} |
JSONB | JSON | True | We sync JSONB values as is, preserving all escaping characters. Malformed JSONB values are synced to the destination as wrapped JSON in the format {"fivetran_error":"This JSON value was malformed in the source","rawValue":"[ORIGINAL JSON VALUE]"} |
MACADDR | STRING | True | |
MONEY | BIGDECIMAL | True | |
NUMERIC/ DECIMAL | BIGDECIMAL | True | NUMERIC without specified precision or scale is synced as FLOAT. |
POINT | (JSON, DOUBLE, DOUBLE) | True | Treated like PostGIS geometry POINT. For details, see the PostGIS geometry data types section |
REAL | FLOAT | True | |
SMALLINT/ SMALLSERIAL | SHORT | True | |
TEXT | STRING | True | |
TIME WITH TIME ZONE | STRING | True | |
TIME WITHOUT TIME ZONE | STRING | True | |
TIMESTAMP WITH TIME ZONE | INSTANT | True | We support timestamps between 0000-01-01T00:00:00Z and 9999-12-31T23:59:59.999999Z . Values outside of this range are not synced to the destination. |
TIMESTAMP WITHOUT TIME ZONE | LOCALDATETIME | True | We support timestamps between 0000-01-01T00:00:00 and 9999-12-31T23:59:59.999999 . Values outside of this range are not synced to the destination. |
TSRANGE | (JSON, LOCALDATETIME, LOCALDATETIME) | True | For backward compatibility, it has two additional columns with _begin and _end suffixes. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
TSTZRANGE | (JSON, INSTANT, INSTANT) | True | For backward compatibility, it has two additional columns with _begin and _end suffixes. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
UUID | STRING | True | |
BIT VARYING | STRING | True | |
BOX | STRING | True | |
CIRCLE | STRING | True | |
COMPOSITE | STRING | True | |
DATERANGE | JSON | True | Unparsable values (such as 10000-01-01 ) are synced to the destination as null values. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
INET | STRING | True | |
INT4RANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
INT8RANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
LINE SEGMENT | STRING | True | |
LINE | STRING | True | |
LTREE | STRING | True | |
NUMRANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
OID | LONG | True | |
PATH | STRING | True | |
PG_LSN | STRING | True | |
POLYGON | STRING | True | |
TSQUERY | STRING | True | |
TSVECTOR | STRING | True | |
XML | XML | True |
We sync unknown data types to the destination as strings. If we are missing an important type that you need, reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.
Unparseable values
When we encounter an unparsable value of one of the following data types, we substitute it with a default value. Which default value we use depends on whether the unparsable value is in a primary key column or non-primary key column:
PostgreSQL Type | Primary Key Value | Non-Primary Key Value |
---|---|---|
TIMESTAMP WITH TIME ZONE | 1970-01-01T00:00:00Z | null |
TIMESTAMP WITHOUT TIME ZONE | 1970-01-01T00:00 | null |
TSRANGE * | 1970-01-01T00:00:00Z | null |
TSTZRANGE * | 1970-01-01T00:00 | null |
*If we are unable to parse either the start or end value in your range, we substitute that value with the default value. If we are unable to parse both values, we replace both values with default values.
PostGIS geography data types
The following table lists Fivetran-supported PostGIS geography data types. They are transformed according to the GeoJson specification and stored in the destination as JSON types. The GeoJson specification does not support SRID, so Fivetran ignores this data type.
PostGIS Type | Fivetran Type | Notes |
---|---|---|
POINT | (JSON, DOUBLE, DOUBLE) | For backward compatibility, it has two additional columns with _long and _lat suffixes |
LINESTRING | JSON | |
POLYGON | JSON | |
MULTIPOINT | JSON | |
MULTILINESTRING | JSON | |
MULTIPOLYGON | JSON | |
GEOMETRYCOLLECTION | JSON |
PostGIS geometry data types
The following table lists Fivetran supported PostGIS geometry data types. They are transformed according to the GeoJson specification and stored in the destinations as JSON types.
PostGIS Type | Fivetran Type | Notes |
---|---|---|
POINT | (JSON, DOUBLE, DOUBLE) | For backward compatibility, it has two additional columns with _x and _y suffixes |
LINESTRING | JSON | |
POLYGON | JSON | |
MULTIPOINT | JSON | |
MULTILINESTRING | JSON | |
MULTIPOLYGON | JSON | |
GEOMETRYCOLLECTION | JSON | |
CIRCULARSTRING | JSON | |
COMPOUNDCURVE | JSON | |
POLYHEDRALSURFACE | JSON | |
CURVEPOLYGON | JSON | |
TIN | JSON | |
TRIANGLE | JSON |
Enumerated types
Fivetran supports enumerated types (ENUM), which are user-defined data types. Learn more in PostgreSQL's Enumerated Types documentation.
We transform and store enumerated types as STRING in the destination.
Domain data types
Fivetran supports domains, which are user-defined data types. When you create a domain, you choose an underlying data type and have the option of specifying additional constraints. Learn more in PostgreSQL's domains documentation.
We transform and store domain data based on their underlying data types.
Excluding source data
If you don’t want to sync all the data from your source database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can restrict the Fivetran user's access to certain tables or columns in your source database.
NOTE: You cannot exclude columns which are primary keys or foreign keys.
How to allow only a subset of tables:
To grant access only to some tables in a schema, first make sure that the Fivetran user has access to the schema itself:
GRANT USAGE ON SCHEMA "some_schema" TO fivetran;
Next, remove any previously granted permissions to all tables in that schema:
ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" REVOKE SELECT ON TABLES FROM fivetran;
REVOKE SELECT ON ALL TABLES IN SCHEMA "some_schema" FROM fivetran;
Repeat this command for each table you wish to include:
GRANT SELECT ON "some_schema"."some_table" TO fivetran;
GRANT SELECT ON "other_schema".* TO fivetran; /* all tables in schema */
Any tables created in the future will be excluded from the Fivetran user by default. To include them, run:
ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" GRANT SELECT ON TABLES TO fivetran;
There is no way to grant access to all tables at once, so you need to individually select all the tables you do want. It is not possible to achieve exclusion by granting Fivetran access to all tables and then revoking access for a subset of tables.
How to allow only a subset of columns:
To grant access only to some columns in a table, first remove any previously granted permission to all columns in the table:
REVOKE SELECT ON "some_schema"."some_table" FROM fivetran;
Then grant permission to only specific columns (for example, some_column
and other_column
:)
GRANT SELECT (xmin, "some_column", "other_column") ON "some_schema"."some_table" TO fivetran;
Any new columns added to that table in the future will be excluded from access by default. To include them, re-run the command above with the new column included.
To grant access to all columns except one, you must individually grant access to all other columns. If you are using XMIN as your change data capture method, you must also grant permissions to the hidden system column xmin
. We need access to the xmin
column to perform our incremental updates.
You can automate this process by scripting in your favorite language. Here is an example of a fixed set of SQL commands being executed via a BASH script:
#!/bin/sh
#Fill in the values without quotes or a space after the equals
host= # ex: 10.10.135.135
port= # ex: 30054
user= # ex: user
password= # ex: asdf235235asfsdf212
#List all of your SQL commands wrapped in single quotes, for example
sql=(
'REVOKE ALL PRIVILEGES, GRANT OPTION FROM fivetran;'
'GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO fivetran;'
'GRANT SELECT ON schema1.* TO fivetran;'
'GRANT SELECT ON schema1.* TO fivetran;'
)
#Connecting to Redshift, hence the PGPASSWORD
do
PGPASSWORD=$password psql --host=$host --port=$port --user=$user $db -c "${sql[$i]}"
done
Initial sync
When Fivetran connects to a new database, we first copy all rows from every table in every schema for which we have SELECT permissions (except for those that you excluded on your Fivetran dashboard) and add Fivetran-generated columns. We copy the tables in ascending size order (from smallest to largest). We copy rows by performing a SELECT statement on each table. For large tables, we copy a limited number of rows at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.
Updating data
Fivetran performs incremental updates of any new or modified data from your source database. We use one of the following methods to perform incremental updates:
Logical replication vs XMIN vs Fivetran Teleport Sync
There are several key differences between logical replication, XMIN, and Fivetran Teleport Sync:
Feature | Logical replication | XMIN | Fivetran Teleport Sync | Notes |
---|---|---|---|---|
Syncs tables with primary keys | Yes | Yes | Partially1 | 1Learn which primary keys Fivetran Teleport Sync supports in the Teleport Limitations documentation. |
Syncs tables without primary keys | Yes | Yes | No | |
Can read from a replica | Yes2 | Yes | Yes | 2 Logical replication supports reading from a replica for PostgreSQL version 16 or later. |
Tracks deletes | Yes | No | Yes | |
Tracks inserts and updates | Yes | Yes | Yes | |
Tracks DDL changes | Partially | Partially | Partially | Learn which DDL changes each incremental sync method supports in the Support for DDL changes table. |
Provides full record of changed data | Yes | Yes3 | Yes | 3 The following actions can cause the XMIN values of a row to change and entire tables to be marked as new, resulting in increased MAR: - Using tools like pg_repack , pg_dump , and pg_restore - Truncating and repopulating tables |
Replicates partitioned tables4 | Yes5 | Yes | Yes | 4 We do not support the TimescaleDB extension. 5 Logical replication supports replicating partitioned tables for PostgreSQL version 13 or later. |
Replicates generated columns | No | Yes | Yes |
IMPORTANT: We recommend using logical replication as the incremental sync method if you have a high-volume transactional database. Logical replication minimizes processing overhead on your PostgreSQL server and supports key features like replicating row deletions for tables with primary keys.
Logical Replication
Logical replication is based on logical decoding of the PostgreSQL write-ahead log (WAL). We recommend this method for incremental updates because it
- Minimizes processing overhead on your PostgreSQL server
- Replicates row deletions for tables with primary keys
Fivetran supports the pgoutput
plugin for logical replication. Your plugin must have its own replication slot. Ensure that you are connected to the correct database when you create your replication slot, or your connector will not be able to find the slot.
However, there are reasons why you might not want to or be able to use logical replication:
Logical replication does not replicate all DDL changes. Learn which DDL changes are supported in the Support for DDL changes table.
Logical replication does not replicate data from generated columns.
For PostgreSQL versions 13 and later, logical replication can replicate partitioned tables. Learn more in PostgreSQL's logical replication restrictions documentation.
Creating a column with the
ALTER TABLE ADD COLUMN SET DEFAULT
command does not trigger an update to the existing table that the logical replication mechanism can recognize. Consequently, those changes are not part of our incremental updates. You need to do a full table re-sync to update the existing table's data in the destination. Learn more in PostgreSQL's Modifying Tables documentation.Heroku PostgreSQL does not support logical replication.
NOTE: Fivetran does not support logical replication if you use the "swap and drop" method of replicating data. In the "swap and drop" method, you create a temporary table and load in data from your current table. You then drop the current table and rename the temporary table so it has the same name as your current table.
If logical replication is not an option for you, Fivetran can fall back on using the XMIN method.
See our FAQ documentation for help in migrating your test_decoding
connector to pgoutput
.
Logical decoding output plugin
pgoutput
is the standard logical decoding plugin for PostgreSQL version 10+. This plugin replicates from your custom publication without needing additional libraries.
NOTE: A publication is a group of change events for a table or group of tables that you choose. To learn more, read PostgreSQL's publication documentation.
pgoutput
allows you to use PostgreSQL's publications feature. This feature only replicates change events from tables in your publication, so you control which changes are replicated to Fivetran.
Declarative partitioning
Newly-added partitions are not automatically included in your syncs unless your schema configuration is set to Allow All.
NOTE: Logical replication in PostgreSQL versions 10-12 does not support adding partitioned tables directly to the publication. If your workflow involves regularly creating new partitions, you must add each new partition to the publication, either manually or automatically through setting up a trigger. Otherwise, your syncs may be rescheduled because of the missing partitions in the publication.
For connectors created before November 8, 2022, we sync incremental data from partitioned tables into partitions. If you'd like it synced into the parent table instead, reach out to Fivetran Support.
For connectors created on or after November 8, 2022, we sync incremental data from partitioned tables into the parent table.
Partitioning using inheritance
Because a child table can inherit from multiple root tables, we treat each table as a separate entity. Therefore, the data physically residing in a parent table is synced into the corresponding parent table in the destination. Similarly, the data physically residing in a child table is synced into the corresponding child table in the destination. You can use the ONLY
keyword in a select statement to query data that physically resides in a specific table, excluding data in its child tables. If you want to see the result of SELECT * FROM base_table
as SELECT * FROM ONLY base_table
instead, set up a trigger to propagate changes from child tables to the base table.
XMIN system column
The hidden PostgreSQL system column xmin
can be used to select only the new or changed rows since the last update. The XMIN method has the following disadvantages:
- Cannot replicate row deletions, so there is no way to tell which rows in the destination are no longer present in the source database. XMIN can't track row deletions because it relies on a hidden system column in PostgreSQL tables that is effectively a
last_modified
column. When a row is deleted, it doesn't appear as being recently modified because it no longer exists. - Requires a full table scan to detect updated rows, which can slow down updates and cause significant processing overhead on your PostgreSQL server.*
- In high-volume transactional databases, freezing and wraparound events may occur with XMIN because of the loss of the state in some circumstances, resulting in automatic re-syncs that may increase your sync time.*
- The
xmin
column is updated on a per-row basis. If any value of a column in a row changes, thexmin
value also changes. Therefore, the connector syncs a row when a column is updated, regardless of whether you selected that column in the connector schema. There is no way for the connector to know to ignore that update. - The value of the
xmin
column has an upper limit of 4,294,967,295. After the database's total number of transactions exceeds this value, it "wraps around" to zero. After this event occurs in PostgreSQL versions 9.5 and later, multiple rows may have the samexmin
value and Fivetran may additionally sync older, unchanged "frozen" rows, resulting in more Monthly Active Rows (MAR) and higher cost. See our solution section for more information on resolving this issue.
*IMPORTANT: You can avoid this limitation by enabling the
pageinspect
andpg_visibility
XMIN extensions. Enabling these extensions can greatly increase the speed of incremental syncs and full imports, as well as handle wraparound scenarios. We highly recommend this solution for Enterprise-level users who use the XMIN incremental sync method. Learn more in the Solution section.
Therefore, we only use the XMIN method if WAL is not an option for you. We recommend using WAL as the primary incremental sync method if you have a high-volume transactional database.
Fivetran Teleport Sync
Fivetran Teleport Sync is a proprietary incremental sync method that can add delete capture with no additional setup other than a read-only SQL connection. Teleport captures updates using the XMIN system column.
Use Fivetran Teleport Sync if enabling logical replication is not an option or if your Postgres version is below 16 and you want to capture deletes from a read replica.
NOTE: Fivetran Teleport Sync supports syncing tables with composite primary keys.
Fivetran Teleport Sync's queries perform the following operations on your database:
- Do an index scan of each synced table's primary keys
- Aggregate a compressed table snapshot in the database's memory
For optimum Fivetran Teleport Sync performance, we recommend that you make the following resources available in your database:
- 1 GB Free RAM
- 1 Free CPU Core
- IOPS (Teleport Sync times decrease linearly with an increase of available IOPS).
NOTE: Teleport captures deletes differently than log-based replication mechanisms. Learn more in the Changes to data between syncs section of our history mode documentation.
Fivetran Teleport Sync can sync the following DDL commands without needing to re-sync the table:
- Adding a new column
- Changing an existing column's data type
- Data type changes that do not affect existing values (For example, INT to BIGINT)
We only recommend the use of the Teleport Sync method if WAL or XMIN is not an option for you. We recommend using WAL as the primary incremental sync method if you have a high-volume transactional database.
Limitations
Fivetran Teleport Sync cannot sync tables without a primary key. It can only sync tables with primary keys of the following types:
- Constrained VARCHAR
- TEXT
- UUID
- BIGINT
- SMALLINT
- INT
- Constrained NUMERIC
- DATE
NOTE: For TEXT type primary keys, we only support syncing rows with IDs that have 500 characters or fewer.
NOTE: We only support syncing constrained VARCHARs when their maximum length is less than 13296 characters.
If you update large portions of a table between syncs, Fivetran Teleport Sync may be unable to determine the changes since the previous sync. When this occurs, the connector will re-sync the entire table to capture the latest data. To avoid these table re-syncs, we recommend that you increase the sync frequency of the connector.
Fast-changing rows may not be marked as deleted in the destination. We consider a row fast changing if more than one of the following operations are applied to it within the connector's sync frequency:
- INSERT
- UPDATE
- DELETE
We do not support syncing more than 400 tables or tables with more than 50 million rows with Fivetran Teleport Sync for PostgreSQL.
Tables with a primary key
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination.
- An UPDATE in the source table updates the data in the corresponding row in the destination.
- (For logical replication) A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
.
NOTE: Fivetran requires SELECT permissions for all primary key columns in every table with a primary key that you want to sync. If you use logical replication as your incremental sync method, all primary key columns must be part of the publication.
Tables without a primary key
Fivetran Teleport Sync does not support tables without a primary key. It can only sync tables with integer or string primary keys.
We handle changes to tables without a primary key in the following ways:
- An INSERT in the source generates a new row in the destination.
- An UPDATE in the source generates a new row in the destination and leaves the old version of that row in the destination untouched. As a result, one record in your source database may have several corresponding rows in your destination.
For logical replication, we can track deletes if the table's replica identity is set to FULL or INDEX. Otherwise, we can only track inserts. Learn more in the Deleted Rows section.
Identify tables with primary keys
To find out which of your tables have (or don't have) primary keys, run this query:
SELECT
table_schema,
table_name,
(table_schema, table_name) IN (
SELECT
_schema.nspname AS table_schema,
_table.relname AS table_name
FROM pg_catalog.pg_constraint c
LEFT JOIN pg_catalog.pg_class _table ON c.conrelid = _table.oid
LEFT JOIN pg_catalog.pg_namespace _schema ON _table.relnamespace = _schema.oid
WHERE c.contype = 'p'
) AS has_primary_key
FROM information_schema.tables
WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ('information_schema', 'looker_scratch') AND
NOT table_schema ~ '^pg_';
The third column in the query results, has_primary_key
, is a binary value that shows whether or not each table has a primary key.
Deleted rows
For logical replication, we can capture deletes for tables without primary keys if the replica identity set to FULL or INDEX.
Fivetran Teleport Sync captures deletes differently than log-based replication mechanisms. Learn more in the Changes to data between syncs section of our history mode documentation.
The XMIN update mechanism does not allow us to recognize deleted rows at all. (For more information, see our XMIN documentation.) Rows that are deleted in the source database are still present in the destination, because there is no way for us to identify them. If you want to recognize deleted rows, use one of these strategies to capture deletions in the source database:
Add a trigger to each table to log the primary keys of deleted rows in a separate table. Once this table has been replicated to the destination, you can use it to generate a view that excludes the deleted rows.
Instead of removing rows, add an 'is_deleted' column to each table and change the business logic to set this column to
TRUE
when you delete a row. To avoid build up of obsolete rows in the source database, you can delete them after a certain period. Make sure you leave enough time for Fivetran replicate the deleted rows to the destination before you remove them (we recommend seven days).
Support for DDL changes
We support different DDL changes based on which incremental sync method your connector uses. The following is a non-exhaustive list of supported DDL changes:
- Yes: The DDL change is reflected in the destination. You do not need to perform a re-sync.
- Partially: The DDL change is partially reflected in the destination.
- No: The DDL change is not reflected in the destination.
DDL Statement | Logical Replication | XMIN | Fivetran Teleport Sync | Notes |
---|---|---|---|---|
CREATE TABLE | Yes | Yes | Yes | All methods: Creates a new table in the destination. |
CREATE SCHEMA | Yes | Yes | Yes | |
ADD COLUMN | Yes | Yes | Yes | All methods: Rows synced before the ADD COLUMN operation have null values in the newly-added destination column. Re-sync the table to update the existing rows with the default value. |
ADD COLUMN ... DEFAULT | Partially | Partially | Partially | All methods: Rows synced before the ADD COLUMN operation have null values in the newly-added destination column. Re-sync the table to update the existing rows with the default value. |
RENAME TO new_table_name | No | No | No | All methods: The subsequent sync imports the table as a new table. |
RENAME TO new_schema_name | No | No | No | All methods: The subsequent sync imports the schema as a new schema. |
RENAME COLUMN | Yes | Yes | Yes | All methods: Adds a new column to the destination. The old column still exists in the destination. We sync new rows inserted after this DDL operation to the new column. If you need data in the new column from before the DDL operation, trigger a table re-sync. |
DROP COLUMN | No | No | No | |
ALTER COLUMN ... TYPE ... | Yes | Yes | Yes | Logical Replication: We support any data types changing to string-based data types. If your data type change is not backward compatible (for example, TEXT to BOOLEAN), drop the replication slot and perform a historical re-sync. XMIN: Certain destinations support data type changes, such as BigQuery and PostgreSQL. Fivetran Teleport Sync: We support all non-primary key data type changes. For primary keys, we only support data type changes that don't affect the value. Unsupported primary key data type changes cause the subsequent sync to re-import the table. |
ALTER COLUMN ... TYPE ... USING | No | Yes | Yes | Fivetran Teleport Sync: We support all non-primary key data type changes. For primary keys, we only support data type changes that don't affect the value. Unsupported primary key data type changes cause the subsequent sync to re-import the table. |
ALTER COLUMN ... SET NOT NULL | No | No | No | |
DROP SCHEMA | No | No | No | |
DROP TABLE | No | No | No | |
TRUNCATE TABLE | Yes | No | Yes | Logical Replication: If you truncate a partition, that change is not reflected in the destination. Truncating one table in a transaction is supported. Truncating a table and its dependent tables using TRUNCATE CASCADE is not supported. If you want to truncate multiple tables, truncate one table per transaction. XMIN: Re-sync the table to capture the truncate change. |
Historical re-sync scenarios
- If you want to migrate to another service provider, we will need to do a full re-sync of your data because the new service provider will not retain the same change tracking data as your original PostgreSQL database.
- If you want to switch update methods to or from Teleport
- If you want to switch update methods from using logical replication to XMIN
Specific conditions for rescheduled syncs
In addition to the general conditions for rescheduled syncs, we reschedule syncs for our PostgreSQL connectors under the following conditions:
- When historical re-sync has taken too long with logical replication update method, we need to reschedule sync so that WAL changes extracted can be consumed and flushed from replication slot.
- When Teleport is not able to capture all changes
- When certain unrecoverable failures occurred