BigQuery

BigQuery is a fully-managed, serverless data warehouse and analytics platform provided by Google Cloud. It is designed to handle and analyze large datasets using SQL queries. The Fivetran BigQuery connector allows you to sync tables and views from your BigQuery dataset (regular dataset or linked dataset without data egress restriction) to your destination.

Fivetran supports BigQuery as both a database connector and a destination.

If you want to sync multiple BigQuery datasets, we recommend that you create a connection for each dataset.

Features

Feature Name	Supported	Notes
Capture deletes		All tables and fields
History mode
Custom data		All tables, views and fields
Data blocking		Column level and table level
Column hashing
Re-sync		Connector table & view level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking		Private Google Access
Authorization via API

Setup guide

Follow our step-by-step BigQuery setup guide to connect your BigQuery dataset with Fivetran.

Private connectivity to BigQuery

Fivetran's Google network is configured with Private Google Access (PGA), which means traffic between Fivetran and BigQuery is always private when Fivetran runs on Google Cloud (GCP). You can ensure a private connection by configuring your Fivetran destination to run on GCP.

Sync overview

Once Fivetran is connected to your BigQuery dataset, we need to ensure all your selected BigQuery data is loaded into your destination reliably. This means syncing all the data that is initially present in your BigQuery datasets along with the new and modified data being loaded into your datasets after connecting with Fivetran. Read the Initial sync and Updating data sections below to know about the methods we use to reliably sync all your selected data.

To see, which sync mechanism Fivetran selects for your BigQuery dataset, refer to the table below.

Dataset type	Table type	Condition	Initial sync mechanism	Incremental sync mechanism
Regular dataset	Standard table/ Partitioned table (that does not require a partition filter)	Not using Flexible column names	Storage Read API	Time Travel
Regular dataset	Standard table/ Partitioned table (that does not require a partition filter)	Using Flexible column names	SQL import	Fivetran Teleport Sync
Regular dataset	Regular view	N/A	SQL import	Fivetran Teleport Sync
Regular dataset	Materialized view	N/A	SQL import	Fivetran Teleport Sync
Linked dataset	Any supported table type	Without data egress restriction	SQL import	Fivetran Teleport Sync

Initial sync

When Fivetran connects to your BigQuery dataset, we copy all the data from every supported table type that you have selected. We use the following methods to read the data:

Storage Read API

BigQuery Storage Read API provides fast access to BigQuery-managed storage. We use it to read the table data in parallel using multiple streams.

SQL import

We use the SELECT col1, col2,.., colN from <project>.<dataset>.<table> SQL query to read the data of table or view.

Updating data

We perform incremental updates for any new or modified data in your source dataset using either BigQuery Time Travel or Fivetran Teleport Sync.

Time Travel

Time travel is a BigQuery feature that allows us to access historical data. Time travel queries perform the following operations on your table:

Perform a full table scan to identify the modified data (inserts, updates, and deletes).
Create a diff table from the modified data in your dataset or query results dataset.
If the name of your table is standard_table, the corresponding diff table is named standard_table_ft_diff.
Read the diff table using Storage Read API.
Drop the diff table.

Fivetran Teleport sync

Fivetran Teleport Sync is our proprietary incremental sync method that fetches only the data changed since the last sync. For more information, see our Fivetran Teleport Sync documentation.

We use this method to perform the following operations on your dataset:

Do a full table scan of every synced table.
Perform calculations on all values in each synced table's rows.
Aggregate a compressed table snapshot in the dataset's memory.

Fivetran Teleport Sync captures deletes differently than log-based replication. For more information, see our Changes to data between syncs documentation.

Limitations

If you update a table during a sync, Fivetran Teleport Sync will not be able to detect these updates during the ongoing sync. In such cases, your connection will re-sync the entire table to capture the changes. To avoid this, we recommend increasing the sync frequency of the connection. To learn more about Fivetran Teleport Sync limitation, see our Teleport sync general limitations.

Schema information

Fivetran tries to replicate the exact schema, tables and views from your BigQuery source dataset to your destination.

We name the source schema with the dataset name you provided in the connection setup form. We replicate the source's tables and views to the destination schema. We name the destination schema with the connection name appended by the dataset name. For example, if the connection name is bigquery_db and the dataset name is fivetran_schema, the destination schema name is bigquery_db_fivetran_schema.

Type transformations and mapping

As we extract your data, we match BigQuery data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.

The following table illustrates how we transform your BigQuery data types into Fivetran-supported types:

BigQuery Type	Fivetran Type	Supported
ARRAY	STRING	Yes
BIGNUMERIC	BIGDECIMAL	Yes
BOOL	BOOLEAN	Yes
BYTES	BINARY	Yes
DATE	LOCALDATE	Yes
DATETIME	LOCALDATETIME	Yes
FLOAT64	BIGDECIMAL	Yes
GEOGRAPHY	JSON	Yes
INT64	INTEGER	Yes
INTERVAL	STRING	Yes
JSON	STRING	Yes
NUMERIC	BIGDECIMAL	Yes
STRING	STRING	Yes
STRUCT	STRING	Yes
TIME	STRING	Yes
TIMESTAMP	INSTANT	Yes

Handling data changes

Since BigQuery does not support primary keys, we use the Fivetran-generated _fivetran_id column as the primary key. We handle changes to the source in the following manner:

An INSERT in the source table creates a new row in the destination with _fivetran_deleted set to FALSE.
A DELETE in the source table marks the corresponding row in the destination with _fivetran_deleted set to TRUE.
An UPDATE in the source table marks the existing row in the destination with _fivetran_deleted set to TRUE and creates a new row with _fivetran_deleted set to FALSE.

Supported table types

The BigQuery connector supports syncing the following table types from regular datasets and linked datasets without data egress restriction:

Table type	Supported
Standard tables	Yes
Regular views	Yes
Materialized views	Yes
Partitioned tables (that do not require a partition filter)	Yes
Tables exported from Google Analytics	No
Partitioned tables (that require a partition filter)	No
Temporary tables	No
External tables	No

To sync Google Analytics export tables, use our Google Analytics 4 or Google Analytics 360 connector.

Limitations

BigQuery connector doesn't support chaining datasets created by another Fivetran connector. The default behavior is to remove existing Fivetran-generated columns _fivetran_id, _fivetran_synced, and _fivetran_active while preserving the existing _fivetran_deleted column from the source table.