PostgreSQLlink
PostgreSQL is an open-source database. Fivetran's integration platform replicates data from your PostgreSQL source database and loads it into your destination.
NOTE: Fivetran supports PostgreSQL as both a database connector and a destination.
Fivetran offers the following solutions for replicating data from your PostgreSQL source database:
- Fivetran native connector
- HVR - PostgreSQL
Supported serviceslink
Fivetran supports the following PostgreSQL database services:
- Generic PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon Aurora Serverless
- Amazon RDS PostgreSQL
- Azure PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
Supported configurationslink
Fivetran supports the following PostgreSQL configurations:
Supportability Category | Supported Values |
---|---|
Database versions | 9.4 - 16 |
Maximum row size | 400MB per row |
Connector limit per database | 3 |
Transport Layer Security (TLS) | TLS 1.1 - 1.3 |
Server encoding | UTF-8 |
Which PostgreSQL instance types we can connect to depends on whether you use logical replication, XMIN, or Fivetran Teleport Sync as your incremental update mechanism. Read our Updating data documentation for more information.
Instance Types | Logical Replication | XMIN | Fivetran Teleport Sync |
---|---|---|---|
Generic PostgreSQL | |||
Primary instance | check (version 10 or later) | check (version 10 or later) | check |
Standby instance | check (version 10 or later) | check | |
Amazon Aurora PostgreSQL | |||
Primary instance | check (version 10 or later) | check (version 10 or later) | check |
Standby instance | check (version 10 or later) | check | |
Amazon Aurora Serverless | |||
Primary instance | check (version 13 or later) | check (version 13 or later) | check |
Standby instance | check (version 13 or later) | check | |
Amazon RDS PostgreSQL | |||
Primary instance | check (version 10 or later) | check (version 10 or later) | check |
Standby instance | check (version 10 or later) | check | |
Azure PostgreSQL | |||
Primary instance | check (version 11 or later) | check (version 10 or later) | check |
Standby instance | check (version 10 or later) | ||
Google Cloud PostgreSQL | |||
Primary instance | check (version 12 or later) | check (version 12 or later) | check (version 10 or later) |
Standby instance | check (version 10 or later) | check | |
Heroku PostgreSQL | |||
Primary instance | check (without extensions) | check | |
Standby instance | check (without extensions) | check |
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | Logical replication and Fivetran Teleport Sync |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
History | check | Supports history mode. |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | AWS PrivateLink: Aurora PostgreSQL, Generic PostgreSQL on EC2, PostgreSQL RDS Azure Private Link: Azure PostgreSQL |
*XMIN does not capture data that exists for an amount of time smaller than the sync interval.
Setup guidelink
In your master database, you need to do the following:
- Allow access to your PostgreSQL database via Fivetran's IP
- Create a Fivetran-specific PostgreSQL user with read-level permissions
- (Optional) Allow access to a read-replica of your PostgreSQL database. Using a read-replica can help to avoid unnecessary strain on your master database.
- (Logical replication Only) Allow access to a logical replication slot
For specific instructions on how to set up your database, see the guide for your PostgreSQL database type:
- Generic PostgreSQL
- Amazon Aurora PostgreSQL
- Amazon RDS PostgreSQL
- Azure PostgreSQL
- Google Cloud PostgreSQL
- Heroku PostgreSQL
Sync overviewlink
Once Fivetran is connected to your PostgreSQL database or read replica, we pull a full dump of all selected data from your database. Using either the Fivetran Teleport Sync, logical replication, or XMIN change data capture process, we pull all your new and changed data at regular intervals.
If data in the source changes (for example, you add new tables or change a data type), Fivetran automatically detects and persists these changes into your destination. For every schema in your PostgreSQL source database, we create a schema in your destination that maps directly to its native schema. This ensures that the data in your destination is in a familiar format to work with.
Syncing empty tables and columnslink
Fivetran will sync empty tables and columns for your PostgreSQL connector. We only sync empty columns to your destination once you make another data alteration to that table. Simply adding a new column will not be reflected in your destination. For more information, see our Features documentation.
Schema informationlink
Fivetran tries to replicate the exact schema and tables from your PostgreSQL source database to your destination.
We name the source schema with the dataset name you provided in the connector setup form. We replicate the source's tables to the destination schema. We name the destination schema with the connector name appended by the dataset name. For example, if the connector name is postgres_db
and the dataset name is fivetran_schema
, the destination schema name is postgres_db_fivetran_schema
. For more information, see our naming conventions documentation.
NOTE: We do not sync foreign tables, views, or materialized views.
Fivetran-generated columnslink
Fivetran adds the following columns to every table that is added to your destination:
_fivetran_synced
(UTC TIMESTAMP) indicates the time when Fivetran last successfully synced the row._fivetran_id
(STRING) is the hash of the non-Fivetran values of each row. It's a unique ID that Fivetran uses to identify rows in tables that do not have a primary key._fivetran_deleted
(BOOLEAN) marks rows that were deleted in the source database.NOTE: If you use XMIN as your incremental update mechanism, your
_fivetran_deleted
column will have null values because XMIN does not track deleted rows.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformation and mappinglink
As we extract your data, we match PostgreSQL data types to types that Fivetran supports. If we don't support a certain data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns of data types that we don't accept or transform.
For primary keys, all supported data types are valid.
The following table illustrates how we transform your PostgreSQL data types into Fivetran supported types:
PostgreSQL Type | Fivetran Type | Fivetran Supported | Notes |
---|---|---|---|
BIGINT/ BIGSERIAL | LONG | True | |
BIT | BOOLEAN | True | |
BOOLEAN | BOOLEAN | True | |
BYTEA | BINARY | True | We only support hexadecimal format. We do not support escape format. |
CHARACTER VARYING | STRING | True | |
CHARACTER | STRING | True | |
CIDR | STRING | True | |
CITEXT | STRING | True | |
DATE | LOCALDATE | True | |
DOUBLE PRECISION | DOUBLE | True | |
GEOGRAPHY | (JSON, DOUBLE, DOUBLE) | True | For details, see the PostGIS geography data types section. For connectors created after 12/14/2020, the JSON data is GeoJSON compliant. |
GEOMETRY | (JSON, DOUBLE, DOUBLE) | True | For details, see the PostGIS geometry data types section. For connectors created after 12/14/2020, the JSON data is GeoJSON compliant. |
HSTORE | JSON | True | |
INTEGER/ SERIAL | INTEGER | True | |
INTERVAL | DOUBLE | True | |
JSON | JSON | True | We sync JSON values as is, preserving all escaping characters. Malformed JSON values are synced to the destination as wrapped JSON in the format {"fivetran_error":"This JSON value was malformed in the source","rawValue":"[ORIGINAL JSON VALUE]"} |
JSONB | JSON | True | We sync JSONB values as is, preserving all escaping characters. Malformed JSONB values are synced to the destination as wrapped JSON in the format {"fivetran_error":"This JSON value was malformed in the source","rawValue":"[ORIGINAL JSON VALUE]"} |
MACADDR | STRING | True | |
MONEY | BIGDECIMAL | True | |
NUMERIC/ DECIMAL | BIGDECIMAL | True | NUMERIC without specified precision or scale is synced as FLOAT. |
POINT | (JSON, DOUBLE, DOUBLE) | True | Treated like PostGIS geometry POINT. For details, see the PostGIS geometry data types section |
REAL | FLOAT | True | |
SMALLINT/ SMALLSERIAL | SHORT | True | |
TEXT | STRING | True | |
TIME WITH TIME ZONE | STRING | True | |
TIME WITHOUT TIME ZONE | STRING | True | |
TIMESTAMP WITH TIME ZONE | INSTANT | True | |
TIMESTAMP WITHOUT TIME ZONE | LOCALDATETIME | True | |
TSRANGE | (JSON, LOCALDATETIME, LOCALDATETIME) | True | For backward compatibility, it has two additional columns with _begin and _end suffixes. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
TSTZRANGE | (JSON, INSTANT, INSTANT) | True | For backward compatibility, it has two additional columns with _begin and _end suffixes. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
UUID | STRING | True | |
BIT VARYING | False | ||
BOX | False | ||
CIRCLE | False | ||
DATERANGE | JSON | True | Unparsable values (such as 10000-01-01 ) are synced to the destination as null values. We sync bounds information to the destination, where EXCLUSIVE and INCLUSIVE indicate the bounds. |
ENUM | True | ||
INET | STRING | True | |
INT4RANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
INT8RANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
LINE SEGMENT | False | ||
LINE | False | ||
LTREE | True | ||
NUMRANGE | JSON | True | We sync bounds information to the destination, where EXCLUSIVE , INCLUSIVE , and OPEN indicate the bounds. |
OID | False | ||
PATH | False | ||
PG_LSN | False | ||
POLYGON | False | ||
REGCLASS | False | ||
REGCONFIG | False | ||
REGDICTIONARY | False | ||
REGNAMESPACE | False | ||
REGOPER | False | ||
REGOPERATOR | False | ||
REGPROC | False | ||
REGPROCEDURE | False | ||
REGROLE | False | ||
REGTYPE | False | ||
TSQUERY | False | ||
TSVECTOR | False | ||
XML | XML | True |
We sync unknown data types to the destination as strings. If we are missing an important type that you need, please reach out to support.
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual data destination pages.
Unparseable valueslink
When we encounter an unparsable value of one of the following data types, we substitute it with a default value. Which default value we use depends on whether the unparsable value is in a primary key column or non-primary key column:
PostgreSQL Type | Primary Key Value | Non-Primary Key Value |
---|---|---|
TIMESTAMP WITH TIME ZONE | 1970-01-01T00:00:00Z | null |
TIMESTAMP WITHOUT TIME ZONE | 1970-01-01T00:00 | null |
TSRANGE * | 1970-01-01T00:00:00Z | null |
TSTZRANGE * | 1970-01-01T00:00 | null |
*If we are unable to parse either the start or end value in your range, we substitute that value with the default value. If we are unable to parse both values, we replace both values with default values.
PostGIS geography data typeslink
The following table lists Fivetran-supported PostGIS geography data types. They are transformed according to the GeoJson specification and stored in the destination as JSON types. The GeoJson specification does not support SRID, so Fivetran ignores this data type.
PostGIS Type | Fivetran Type | Notes |
---|---|---|
POINT | (JSON, DOUBLE, DOUBLE) | For backward compatibility, it has two additional columns with _long and _lat suffixes |
LINESTRING | JSON | |
POLYGON | JSON | |
MULTIPOINT | JSON | |
MULTILINESTRING | JSON | |
MULTIPOLYGON | JSON | |
GEOMETRYCOLLECTION | JSON |
PostGIS geometry data typeslink
The following table lists Fivetran supported PostGIS geometry data types. They are transformed according to the GeoJson specification and stored in the destinations as JSON types.
PostGIS Type | Fivetran Type | Notes |
---|---|---|
POINT | (JSON, DOUBLE, DOUBLE) | For backward compatibility, it has two additional columns with _x and _y suffixes |
LINESTRING | JSON | |
POLYGON | JSON | |
MULTIPOINT | JSON | |
MULTILINESTRING | JSON | |
MULTIPOLYGON | JSON | |
GEOMETRYCOLLECTION | JSON | |
CIRCULARSTRING | JSON | |
COMPOUNDCURVE | JSON | |
POLYHEDRALSURFACE | JSON | |
CURVEPOLYGON | JSON | |
TIN | JSON | |
TRIANGLE | JSON |
Domain data typeslink
Fivetran supports domains, which are user-defined data types. When you create a domain, you choose an underlying data type and have the option of specifying additional constraints. Learn more in PostgreSQL's domains documentation.
We transform and store domain data based on their underlying data types.
Excluding source datalink
If you don’t want to sync all the data from your source database, you can exclude schemas, tables, or columns from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can restrict the Fivetran user's access to certain tables or columns in your source database.
NOTE: You cannot exclude columns which are primary keys or foreign keys.
How to allow only a subset of tables:link
To grant access only to some tables in a schema, first make sure that the Fivetran user has access to the schema itself:
GRANT USAGE ON SCHEMA "some_schema" TO fivetran;
Next, remove any previously granted permissions to all tables in that schema:
ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" REVOKE SELECT ON TABLES FROM fivetran;
REVOKE SELECT ON ALL TABLES IN SCHEMA "some_schema" FROM fivetran;
Repeat this command for each table you wish to include:
GRANT SELECT ON "some_schema"."some_table" TO fivetran;
GRANT SELECT ON "other_schema".* TO fivetran; /* all tables in schema */
Any tables created in the future will be excluded from the Fivetran user by default. To include them, run:
ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" GRANT SELECT ON TABLES TO fivetran;
There is no way to grant access to all tables at once, so you need to individually select all the tables you do want. It is not possible to achieve exclusion by granting Fivetran access to all tables and then revoking access for a subset of tables.
How to allow only a subset of columns:link
To grant access only to some columns in a table, first remove any previously granted permission to all columns in the table:
REVOKE SELECT ON "some_schema"."some_table" FROM fivetran;
Then grant permission to only specific columns (for example, some_column
and other_column
:)
GRANT SELECT (xmin, "some_column", "other_column") ON "some_schema"."some_table" TO fivetran;
Any new columns added to that table in the future will be excluded from access by default. To include them, re-run the command above with the new column included.
To grant access to all columns except one, you must individually grant access to all other columns. If you are using XMIN as your change data capture method, you must also grant permissions to the hidden system column xmin
. We need access to the xmin
column to perform our incremental updates.
You can automate this process by scripting in your favorite language. Here is an example of a fixed set of SQL commands being executed via a BASH script:
#!/bin/sh
#Fill in the values without quotes or a space after the equals
host= # ex: 10.10.135.135
port= # ex: 30054
user= # ex: user
password= # ex: asdf235235asfsdf212
#List all of your SQL commands wrapped in single quotes, for example
sql=(
'REVOKE ALL PRIVILEGES, GRANT OPTION FROM fivetran;'
'GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO fivetran;'
'GRANT SELECT ON schema1.* TO fivetran;'
'GRANT SELECT ON schema1.* TO fivetran;'
)
#Connecting to Redshift, hence the PGPASSWORD
do
PGPASSWORD=$password psql --host=$host --port=$port --user=$user $db -c "${sql[$i]}"
done
Initial synclink
When Fivetran connects to a new database, we first copy all rows from every table in every schema for which we have SELECT permissions (except for those that you excluded on your Fivetran dashboard) and add Fivetran-generated columns. We copy the tables in ascending size order (from smallest to largest). We copy rows by performing a SELECT statement on each table. For large tables, we copy a limited number of rows at a time so that we don't have to start the sync over from the beginning if our connection is lost midway.
Updating datalink
Fivetran performs incremental updates of any new or modified data from your source database. We use one of the following mechanisms to perform incremental updates:
Logical Replicationlink
Logical replication is based on logical decoding of the PostgreSQL write-ahead log (WAL). We recommend this method for incremental updates because it
- Minimizes processing overhead on your PostgreSQL server
- Replicates row deletions for tables with primary keys
Fivetran supports the pgoutput
plugin for logical replication. Your plugin must have its own replication slot. Ensure that you are connected to the correct database when you create your replication slot, or your connector will not be able to find the slot.
However, there are reasons why you might not want to or be able to use logical replication:
Logical replication requires version 9.4.15 or later (prior minor versions have bugs)
Logical replication does not replicate all DDL changes. Learn which DDL changes are supported in the Support for DDL changes table.
Logical replication does not replicate data from generated columns.
For PostgreSQL versions 13+, logical replication can replicate partitioned tables. Learn more in PostgreSQL's logical replication restrictions documentation.
For PostgreSQL versions 10.2+, creating a column with the
alter table add column set default
command does not trigger an update to the existing table that the logical replication mechanism can recognize. Consequently, those changes are not part of our incremental updates. You need to do a full table re-sync to update the existing table's data in the destination. Learn more in PostgreSQL's Modifying Tables documentation.If your PostgreSQL database uses PgBouncer for connection pooling, logical replication does not work.
In Generic PostgreSQL
- Only supported on primary instances, so you cannot enable it on a read replica
- May need significant extra storage space (even if log storage is only temporary)
- Requires a server reboot to activate logging
In Amazon RDS PostgreSQL
- Only supported on RDS primary instances, so you cannot enable it on a read replica
- May need significant extra storage space (even if log storage is only temporary)
- Requires a primary instance reboot to activate logging
In Azure PostgreSQL
- Only supported on Azure primary instances, so you cannot enable it on a read replica
- Only supported for Azure PostgreSQL version 10 or later
- May need significant extra storage space (even if log storage is only temporary)
- Requires a server reboot to activate logging
In Heroku PostgreSQL
- Does not support logical replication
In Amazon Aurora PostgreSQL
- Only supported on Aurora primary instances, so you cannot enable it on a read replica
- Only supported for Aurora PostgreSQL version 10.6 or later
Google Cloud PostgreSQL
- Only supported on Google Cloud primary instances, so you cannot enable it on a read replica
NOTE: Fivetran does not support logical replication if you use the "swap and drop" method of replicating data. In the "swap and drop" method, you create a temporary table and load in data from your current table. You then drop the current table and rename the temporary table so it has the same name as your current table.
If logical replication is not an option for you, Fivetran can fall back on using the XMIN method.
See our FAQ documentation for help in migrating your test_decoding
connector to pgoutput
.
Heartbeatlink
To ensure that the restart_lsn
is always moving despite idle activity in your database, we issue the following Heartbeat command every ten minutes during the logical replication sync:
BEGIN;
DROP aggregate IF EXISTS EPHEMERAL_HEARTBEAT(float4);
CREATE AGGREGATE EPHEMERAL_HEARTBEAT(float4) (SFUNC = float4pl, STYPE = float4);
DROP aggregate EPHEMERAL_HEARTBEAT(float4);
END;
The aggregate will be deleted within the Heartbeat command and it will not exist afterwards in your database or in your destination. The Heartbeat query does not modify any data in your database. Heartbeat avoids the replication slot growing indefinitely and shortens the WAL duration when there's no change in your database.
Logical Replicationlink
pgoutput
is the standard logical decoding plugin for PostgreSQL version 10+. This plugin replicates from your custom publication without needing additional libraries.
NOTE: A publication is a group of change events for a table or group of tables that you choose. To learn more, read PostgreSQL's publication documentation.
pgoutput
allows you to use PostgreSQL's publications feature. This feature only replicates change events from tables in your publication, so you control which changes are replicated to Fivetran.
Partitioned tableslink
Newly-added partitions are not automatically included in your syncs unless your schema configuration is set to Allow All.
For connectors created before November 8, 2022, we sync incremental data from partitioned tables into partitions. If you'd like it synced into the parent table instead, reach out to Fivetran Support.
For connectors created on or after November 8, 2022, we sync incremental data from partitioned tables into the parent table.
XMIN system columnlink
The hidden PostgreSQL system column xmin
can be used to select only the new or changed rows since the last update. The XMIN method has the following disadvantages:
- Cannot replicate row deletions, so there is no way to tell which rows in the destination are no longer present in the source database. XMIN can't track row deletions because it relies on a hidden system column in PostgreSQL tables that is effectively a
last_modified
column. When a row is deleted, it doesn't appear as being recently modified because it no longer exists. - Requires a full table scan to detect updated rows, which can slow down updates and cause significant processing overhead on your PostgreSQL server.*
- In high-volume transactional databases, freezing and wraparound events may occur with XMIN because of the loss of the state in some circumstances, resulting in automatic re-syncs that may increase your sync time.*
- The
xmin
column is updated on a per-row basis. If any value of a column in a row changes, thexmin
value also changes. Therefore, the connector syncs a row when a column is updated, regardless of whether you selected that column in the connector schema. There is no way for the connector to know to ignore that update. - The value of the
xmin
column has an upper limit of 4,294,967,295. After the database's total number of transactions exceeds this value, it "wraps around" to zero. After this event occurs in PostgreSQL versions 9.5 and later, multiple rows may have the samexmin
value and Fivetran may additionally sync older, unchanged "frozen" rows, resulting in more Monthly Active Rows (MAR) and higher cost. See our solution section for more information on resolving this issue.
*IMPORTANT: You can avoid this limitation by enabling the
pageinspect
andpg_visibility
XMIN extensions. Enabling these extensions can greatly increase the speed of incremental syncs and full imports, as well as handle wraparound scenarios. We highly recommend this solution for Enterprise-level users who use the XMIN incremental update method. Learn more in the Solution section.
Therefore, we only use the XMIN method if WAL is not an option for you. We recommend using WAL as the primary incremental update mechanism if you have a high-volume transactional database.
Fivetran Teleport Synclink
Fivetran Teleport Sync is a proprietary database replication method that can incrementally replicate your database with no additional setup other than a read-only SQL connection.
Use Fivetran Teleport Sync if enabling logical replication is not an option or if you want to capture deletes from a read replica.
NOTE: Fivetran Teleport Sync supports syncing tables with composite primary keys.
Fivetran Teleport Sync's queries perform the following operations on your database:
- Do a full table scan of each synced table
- Perform calculations on all values in each synced table's rows
- Aggregate a compressed table snapshot in the database's memory
For optimum Fivetran Teleport Sync performance, we recommend that you make the following resources available in your database:
- 1 GB Free RAM
- 1 Free CPU Core
- IOPS (Teleport Sync times decrease linearly with an increase of available IOPS).
NOTE: Teleport captures deletes differently than log-based replication mechanisms. Learn more in the Changes to data between syncs section of our history mode documentation.
Fivetran Teleport Sync can sync the following DDL commands without needing to re-sync the table:
- Adding a new column
- Changing an existing column's data type
- Data type changes that do not affect existing values (For example, INT to BIGINT)
We only recommend the use of the Teleport Sync method if WAL or XMIN is not an option for you. We recommend using WAL as the primary incremental update mechanism if you have a high-volume transactional database.
Limitationslink
Fivetran Teleport Sync cannot sync tables without a primary key. It can only sync tables with primary keys of the following types:
- Constrained VARCHAR
- TEXT
- UUID
- BIGINT
- SMALLINT
- INT
- Constrained NUMERIC
- DATE
NOTE: For TEXT type primary keys, we only support syncing rows with IDs that have 500 characters or fewer.
NOTE: We only support syncing constrained VARCHARs when their maximum length is less than 13312 characters.
Fivetran Teleport Sync may not be able to determine the changes since the previous sync if large portions of the tables are updated between syncs. If too many rows have been updated since the previous sync, the connector will re-sync the whole table to capture the latest data. To avoid these table re-syncs, we recommend that you increase the sync frequency of the connector.
Tables with a primary keylink
We merge changes to tables with primary keys into the corresponding tables in your destination:
- An INSERT in the source table generates a new row in the destination.
- An UPDATE in the source table updates the data in the corresponding row in the destination.
- (For logical replication) A DELETE in the source table updates the corresponding row in the destination with
_fivetran_deleted = TRUE
.
Tables without a primary keylink
Fivetran Teleport Sync does not support tables without a primary key. It can only sync tables with integer or string primary keys.
We handle changes to tables without a primary key in the following ways:
- An INSERT in the source generates a new row in the destination.
- An UPDATE in the source generates a new row in the destination and leaves the old version of that row in the destination untouched. As a result, one record in your source database may have several corresponding rows in your destination.
Fivetran cannot recognize deleted rows in tables without a primary key. For more information, see the Deleted Rows section.
For logical replication, only INSERT operations would be passed through unless the table was given Replica Identity
.
Identify tables with primary keyslink
To find out which of your tables have (or don't have) primary keys, run this query:
SELECT
table_schema,
table_name,
(table_schema, table_name) IN (
SELECT
_schema.nspname AS table_schema,
_table.relname AS table_name
FROM pg_catalog.pg_constraint c
LEFT JOIN pg_catalog.pg_class _table ON c.conrelid = _table.oid
LEFT JOIN pg_catalog.pg_namespace _schema ON _table.relnamespace = _schema.oid
WHERE c.contype = 'p'
) AS has_primary_key
FROM information_schema.tables
WHERE table_type = 'BASE TABLE' AND table_schema NOT IN ('information_schema', 'looker_scratch') AND
NOT table_schema ~ '^pg_';
The third column in the query results, has_primary_key
, is a binary value that shows whether or not each table has a primary key.
Deleted rowslink
Logical replication does not allow us to recognize deleted rows in tables without a primary key. For more information, see our logical replication documentation.
Fivetran Teleport Sync captures deletes differently than log-based replication mechanisms. Learn more in the Changes to data between syncs section of our history mode documentation.
The XMIN update mechanism does not allow us to recognize deleted rows at all. (For more information, see our XMIN documentation.) Rows that are deleted in the source database are still present in the destination, because there is no way for us to identify them. If you want to recognize deleted rows, use one of these strategies to capture deletions in the source database:
Add a trigger to each table to log the primary keys of deleted rows in a separate table. Once this table has been replicated to the destination, you can use it to generate a view that excludes the deleted rows.
Instead of removing rows, add an 'is_deleted' column to each table and change the business logic to set this column to
TRUE
when you delete a row. To avoid build up of obsolete rows in the source database, you can delete them after a certain period. Make sure you leave enough time for Fivetran replicate the deleted rows to the destination before you remove them (we recommend seven days).
For the pgoutput
update method, we support deleting data with the TRUNCATE
command. For other update methods, you can do a full table re-sync to capture TRUNCATE
deletions.
Support for DDL changeslink
We support different DDL changes based on which incremental update method your connector uses. The following is a non-exhaustive list of supported DDL changes:
- Yes: The DDL change is reflected in the destination. You do not need to perform a re-sync.
- Partially: The DDL change is partially reflected in the destination.
- No: The DDL change is not reflected in the destination.
DDL Statement | Logical Replication | XMIN | Fivetran Teleport Sync |
---|---|---|---|
CREATE TABLE | Yes | Yes | Yes |
CREATE SCHEMA | Yes | Yes | Yes |
ADD COLUMN | Yes1 | Yes1 | Yes |
ADD COLUMN ... DEFAULT | Partially2 | Partially2 | Yes3 |
RENAME TO new_table_name | No4 | No4 | No4 |
RENAME TO new_schema_name | No5 | No5 | No5 |
RENAME COLUMN | Yes2 | Yes1 | Yes3 |
DROP COLUMN | No | No | No |
ALTER COLUMN ... TYPE ... | Yes7 | Yes8 | Yes9 |
ALTER COLUMN ... TYPE ... USING | No | Yes | Yes9 |
ALTER COLUMN ... SET NOT NULL | No | No | No |
DROP SCHEMA | No | No | No |
DROP TABLE | No | No | No |
TRUNCATE TABLE | Yes6 | No | Yes |
1 If there are DML events following the ALTER TABLE operation, the DDL change is reflected in the destination.
2 In the destination, rows synced prior to the ADD COLUMN operation have null
values in the newly-added column. Re-sync the table if you want to update the existing rows in the destination with the default value.
3 The subsequent sync re-imports the table.
4 The subsequent sync imports the table as a new table.
5 The subsequent sync imports the schema as a new schema.
6 If you truncate a partition, that change is not reflected in the destination.
7 We support any data types changing to string-based data types. If you're making a non-backward compatible data type change (for example, TEXT to BOOLEAN), you must drop the replication slot and perform a historical re-sync.
8 Certain destinations support data type changes, such as BigQuery and PostgreSQL.
9 We support data type changes that don't affect the value. However, we do not support VARCHAR to TEXT.
XMIN wraparound causing excess MARlink
The value of the xmin
column has an upper limit of 4,294,967,295. After the database's total number of transactions exceeds this value, it "wraps around" to zero. After this event occurs in PostgreSQL versions 9.5 and later, multiple rows may have the same xmin
value and Fivetran may additionally sync older, unchanged "frozen" rows, resulting in more MAR and higher cost.
Solutionlink
The recommended solution to this problem is to use logical replication or Teleport as the incremental update mechanism.
If neither of the two replication methods is an option, the following approach can be taken to resolve the excess MAR problem:
- Verify that Fivetran has read access to the
ctid
andxmax
columns on all tables - Enable the
pg_visibility
extension and create an associated wrapper function to filter frozen pages - Enable the
pageinspect
extension and create an associated wrapper function to filter frozen rows
This solution will not work for:
- Non-ordinary tables (views, materialized views, partitioned tables, and etc.)
- Heroku PostgreSQL
- Amazon Aurora PostgreSQL (partially works, see note below)
NOTE: Amazon Aurora PostgreSQL supports the
pg_visibility
extension, but not thepageinspect
extension. When using an Aurora PostgreSQL source database, only frozen pages can be detected, not frozen rows in a page. Any frozen rows contained in unfrozen pages may still be synced, resulting in excess MAR. For more information, see this article.
Verify that Fivetran has read access to the ctid
and xmax
columnslink
Fivetran must be able to access the ctid
column and xmax
column in every table that needs to be synced. If table level read-only access has been granted to the Fivetran user, we will have access to those columns and you can skip to the next step. If column level permissions were granted, then manual access likely needs to be granted.
If Fivetran does not have access to these columns on any particular table, you will be notified with a warning explaining which tables are missing privileges to read these columns. For information on how to verify and grant access, please see this article.
Enable the pg_visibility
extensionlink
NOTE: Do I need to make the Fivetran user a superuser?
No. We access the
pg_visibility
andpageinspect
extensions using wrapper functions that run with elevated superuser privileges using theSECURITY DEFINER
keyword. This allows the Fivetran user to make use of these extensions without requiring superuser access.
Run the following command as a superuser:
CREATE EXTENSION pg_visibility;
Confirm the extension is installed by running the following query:
SELECT * FROM pg_available_extensions WHERE name = 'pg_visibility' AND installed_version IS NOT NULL;
Create get_all_pages
wrapper functionlink
Fivetran will use this to leverage the pg_visibility_map
function from the pg_visibility
extension to detect frozen pages in a table. Run the following commands as a superuser:
CREATE SCHEMA fivetran;
CREATE OR REPLACE FUNCTION fivetran.get_all_pages(v_table_name character varying)
RETURNS TABLE (
pagenumber integer,
all_visible_yn boolean,
all_frozen_yn boolean)
LANGUAGE plpgsql
SECURITY definer
AS $function$
declare
begin
RETURN QUERY
SELECT blkno::int as pageNumber,
all_visible as all_visible_yn,
all_frozen as all_frozen_yn
FROM pg_visibility_map($1::regclass);
END;
$function$;
GRANT USAGE ON SCHEMA fivetran TO FIVETRAN_CONNECTOR_DATABASE_USERNAME_HERE;
GRANT EXECUTE ON FUNCTION fivetran.get_all_pages TO FIVETRAN_CONNECTOR_DATABASE_USERNAME_HERE;
NOTE: Ensure that the function is defined with
SECURITY definer
and owned by a superuser, not the Fivetran user.
Confirm the wrapper function was created by running the following query:
SELECT p.proname as name
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON p.pronamespace = n.oid
WHERE n.nspname = 'fivetran'
AND proname IN ('get_all_pages')
AND has_function_privilege(p.oid, 'execute')
AND has_schema_privilege(n.oid, 'usage');
NOTE: Replace
FIVETRAN_CONNECTOR_DATABASE_USERNAME_HERE
with the specific PostgreSQL user account name, which the Fivetran connector uses to access the source database.
NOTE: Skip the remaining steps if you are using Amazon Aurora PostgreSQL. See this related article for more information.
Enable the pageinspect
extensionlink
Run the following command as a superuser:
CREATE EXTENSION pageinspect;
Confirm the extension is installed by running the following query:
SELECT * FROM pg_available_extensions WHERE name = 'pageinspect' AND installed_version IS NOT NULL;
Create get_items_in_page
wrapper functionlink
Fivetran will use this to leverage the get_raw_page
and heap_page_items
functions from the pageinspect
extension to detect frozen rows in a page. Run the following commands as a superuser:
CREATE OR REPLACE FUNCTION fivetran.get_items_in_page(
v_table_name character varying,
v_page_number integer)
RETURNS TABLE (
lp smallint,
lp_off smallint,
lp_flags smallint,
lp_len smallint,
t_xmin xid,
t_xmax xid,
t_field3 integer,
t_ctid tid,
t_infomask integer,
t_hoff smallint,
t_bits text,
t_oid oid,
t_data bytea)
LANGUAGE plpgsql
SECURITY DEFINER
AS $function$
declare
begin
RETURN QUERY
SELECT h.lp, h.lp_off, h.lp_flags,
h.lp_len, h.t_xmin, h.t_xmax,
h.t_field3, h.t_ctid, h.t_infomask,
h.t_hoff, h.t_bits, h.t_oid, h.t_data
FROM heap_page_items(get_raw_page($1,$2)) as h;
END;
$function$;
GRANT EXECUTE ON FUNCTION fivetran.get_items_in_page TO FIVETRAN_CONNECTOR_DATABASE_USERNAME_HERE;
NOTE: Ensure that the function is defined with
SECURITY definer
and owned by a superuser, not the Fivetran user.
Confirm the wrapper function was created by running the following query:
SELECT p.proname as name
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON p.pronamespace = n.oid
WHERE n.nspname = 'fivetran'
AND proname IN ('get_items_in_page')
AND has_function_privilege(p.oid, 'execute')
AND has_schema_privilege(n.oid, 'usage');
NOTE: Replace
FIVETRAN_CONNECTOR_DATABASE_USERNAME_HERE
with the specific PostgreSQL user account name that the Fivetran connector uses to access the source database.
Historical re-sync scenarioslink
- If you want to migrate to another service provider, we will need to do a full re-sync of your data because the new service provider will not retain the same change tracking data as your original PostgreSQL database.
- If you want to switch update methods to or from Teleport
- If you want to switch update methods from using logical replication to XMIN
Specific conditions for rescheduled syncslink
In addition to the general conditions for rescheduled syncs, we reschedule syncs for our PostgreSQL connectors under the following conditions:
- When historical re-sync has taken too long with logical replication update method, we need to reschedule sync so that WAL changes extracted can be consumed and flushed from replication slot.
- When Teleport is not able to capture all changes
- When certain unrecoverable failures occurred