BigQuery
BigQuery is a fully-managed, serverless data warehouse and analytics platform provided by Google Cloud. It is designed to handle and analyze large datasets using SQL queries. The Fivetran BigQuery connector allows you to sync tables and views from your BigQuery dataset (regular dataset or linked dataset without data egress restriction) to your destination.
NOTE: Fivetran supports BigQuery as both a database connector and a destination.
TIP: If you want to sync multiple BigQuery datasets, we recommend that you create a connector for each dataset.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | check | |
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | check | |
Authorization via API |
Setup guide
Follow our step-by-step BigQuery setup guide to connect your BigQuery dataset with Fivetran.
Private connectivity to BigQuery
Fivetran's Google network is configured with Private Google Access (PGA), which means traffic between Fivetran and BigQuery is always private when Fivetran runs on Google Cloud (GCP). You can ensure a private connection by configuring your Fivetran destination to run on GCP.
Sync overview
Once Fivetran is connected to your BigQuery dataset, we need to ensure all your selected BigQuery data is loaded into your destination reliably. This means syncing all the data that is initially present in your BigQuery datasets along with the new and modified data being loaded into your datasets after connecting with Fivetran. Read the Initial sync and Updating data sections below to know about the methods we use to reliably sync all your selected data.
To see, which sync mechanism Fivetran selects for your BigQuery dataset, refer to the table below.
Dataset type | Table type | Condition | Initial sync mechanism | Incremental sync mechanism |
---|---|---|---|---|
Regular dataset | Standard table/ Partitioned table (that does not require a partition filter) | Not using Flexible column names | Storage Read API | Time Travel |
Regular dataset | Standard table/ Partitioned table (that does not require a partition filter) | Using Flexible column names | SQL import | Fivetran Teleport Sync |
Regular dataset | Regular view | N/A | SQL import | Fivetran Teleport Sync |
Regular dataset | Materialized view | N/A | SQL import | Fivetran Teleport Sync |
Linked dataset | Any supported table type | Without data egress restriction | SQL import | Fivetran Teleport Sync |
Initial sync
When Fivetran connects to your BigQuery dataset, we copy all the data from every supported table type that you have selected. We use the following methods to read the data:
Storage Read API
BigQuery Storage Read API provides fast access to BigQuery-managed storage. We use it to read the table data in parallel using multiple streams.
SQL import
We use the SELECT col1, col2,.., colN from <project>.<dataset>.<table>
SQL query to read the data of table or view.
Updating data
We perform incremental updates for any new or modified data in your source dataset using either BigQuery Time Travel or Fivetran Teleport Sync.
Time Travel
Time travel is a BigQuery feature that allows us to access historical data. Time travel queries perform the following operations on your table:
- Perform a full table scan to identify the modified data (inserts, updates, and deletes).
- Create a diff table from the modified data in your dataset or query results dataset.
NOTE: If the name of your table is
standard_table
, the corresponding diff table is namedstandard_table_ft_diff
. - Read the diff table using Storage Read API.
- Drop the diff table.
Fivetran Teleport sync
Fivetran Teleport Sync is our proprietary incremental sync method that fetches only the data changed since the last sync. For more information, see our Fivetran Teleport Sync documentation.
We use this method to perform the following operations on your dataset:
- Do a full table scan of every synced table.
- Perform calculations on all values in each synced table's rows.
- Aggregate a compressed table snapshot in the dataset's memory.
NOTE: Fivetran Teleport Sync captures deletes differently than log-based replication. For more information, see our Changes to data between syncs documentation.
Schema information
Fivetran tries to replicate the exact schema, tables and views from your BigQuery source dataset to your destination.
We name the source schema with the dataset name you provided in the connector setup form. We replicate the source's tables and views to the destination schema. We name the destination schema with the connector name appended by the dataset name. For example, if the connector name is bigquery_db
and the dataset name is fivetran_schema
, the destination schema name is bigquery_db_fivetran_schema
.
Type transformations and mapping
As we extract your data, we match BigQuery data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.
The following table illustrates how we transform your BigQuery data types into Fivetran-supported types:
BigQuery Type | Fivetran Type | Supported |
---|---|---|
ARRAY | STRING | Yes |
BIGNUMERIC | BIGDECIMAL | Yes |
BOOL | BOOLEAN | Yes |
BYTES | BINARY | Yes |
DATE | LOCALDATE | Yes |
DATETIME | LOCALDATETIME | Yes |
FLOAT64 | BIGDECIMAL | Yes |
GEOGRAPHY | JSON | Yes |
INT64 | INTEGER | Yes |
INTERVAL | STRING | Yes |
JSON | STRING | Yes |
NUMERIC | BIGDECIMAL | Yes |
STRING | STRING | Yes |
STRUCT | STRING | Yes |
TIME | STRING | Yes |
TIMESTAMP | INSTANT | Yes |
Handling data changes
Since BigQuery does not support primary keys, we use the Fivetran-generated _fivetran_id
column as the primary key. We handle changes to the source in the following manner:
- An INSERT in the source table creates a new row in the destination with _fivetran_deleted set to FALSE.
- A DELETE in the source table marks the corresponding row in the destination with _fivetran_deleted set to TRUE.
- An UPDATE in the source table marks the existing row in the destination with _fivetran_deleted set to TRUE and creates a new row with _fivetran_deleted set to FALSE.
Supported table types
The BigQuery connector supports syncing the following table types from regular datasets and linked datasets without data egress restriction:
Table type | Supported |
---|---|
Standard tables | Yes |
Regular views | Yes |
Materialized views | Yes |
Partitioned tables (that do not require a partition filter) | Yes |
Tables exported from Google Analytics | No |
Partitioned tables (that require a partition filter) | No |
Temporary tables | No |
External tables | No |
TIP: To sync Google Analytics export tables, use our Google Analytics 4 or Google Analytics 360 connector.
Limitations
BigQuery connector doesn't support chaining datasets created by another Fivetran connector. The default behavior is to remove existing Fivetran-generated columns _fivetran_id
, _fivetran_synced
, and _fivetran_active
while preserving the existing _fivetran_deleted
column from the source table.