Data Replication to BigQuery Takes a Long Time
Issue
Data replication from PostgreSQL to BigQuery takes a long time, regardless of the dataset size.
Environment
- Connector: Google Cloud PostgreSQL
- Destination: BigQuery
Resolution
To resolve this issue, exclude unnecessary tables from your connector's schema or distribute the load by creating additional Google Cloud PostgreSQL connectors, with each syncing different subsets of the data.
To adjust the relevant connector's schema configuration, do the following:
- In Fivetran, go to the relevant Google Cloud PostgreSQL connector page.
- Select the Schema tab.
- Select or de-select the checkbox next to the tables you want to include or exclude from your sync.
- Click Save changes.
Cause
Google Cloud PostgreSQL does not support the logical replication of the WAL. Fivetran, therefore, uses the XMIN sync strategy. This results in Fivetran performing full table scans each sync to identify new or updated records.