BigQuery

BigQuery is Google's serverless, petabyte-scale data warehouse for analytics. With BigQuery, you can run analytics over large amounts of data using standard SQL queries.

Fivetran supports BigQuery as both a database connector and a destination.

Fivetran can sync to BigQuery as frequently as every five minutes.

Supported deployment models

We support the SaaS and Hybrid deployment models for the destination.

You must have an Enterprise or Business Critical plan to use the Hybrid Deployment model.

Setup guide

Follow our step-by-step BigQuery setup guide to connect your BigQuery data warehouse with Fivetran.

Private connectivity to BigQuery

Fivetran's Google network is configured with Private Google Access (PGA), which means traffic between Fivetran and BigQuery is always private when Fivetran runs on Google Cloud (GCP). You can ensure a private connection by configuring your Fivetran destination to run on GCP.

Type transformation mapping

The data types in your BigQuery data warehouse follow Fivetran's standard data type storage.

We use the following data type conversions:

Fivetran Data Type	Destination Data Type	Notes
BOOLEAN	BOOLEAN
SHORT	INTEGER
INT	INTEGER
LONG	INTEGER
BIGDECIMAL	BIGNUMERIC or FLOAT	If a column has no precision or scale defined, we convert its data type to FLOAT. Otherwise, we convert it to BIGNUMERIC.
FLOAT	FLOAT
DOUBLE	FLOAT
LOCALDATE	DATE	For source connections created before March 5, 2020, we convert LOCALDATE to TIMESTAMP.
LOCALDATETIME	DATETIME
INSTANT	TIMESTAMP
STRING	STRING
JSON	STRING
BINARY	BYTES

Fivetran represents nested JSON (including arrays) as STRING columns in BigQuery. This happens with semi-structured data sources (For example, webhooks, MongoDB, and JSON files).

Parse JSON data

Use BigQuery's native JSON functions in Standard SQL to retrieve data stored in JSON-formatted strings.

For example, if you have a table in your BigQuery destination TEST.JSON_PARSING_EXAMPLE (id INT64, json STRING), with the following nested JSON structure:

 {
    "id":1,
     "name": {
     "first_name": "john",
     "last_name": "smith",
     "full_name": "john_smith"
     }, 
     "addresses":[
       {
         "name": "home",
         "city": "oakland",
         "state": "california",
         "country": "usa"
       }, 
       {
         "name": "work",
         "city": "denver",
         "state": "colorado",
         "country": "usa"
       }  
     ]
 }

To parse the data:

Query the data. Execute the following commands:

   select 
        json_value (json, '$.id') as json_id,
        json_value (json, '$.name.first_name') as first_name
   from test.json_parsing_example;

Row	json_id	first_name
1	1	john

Flatten the data. Execute the following commands:

   select 
        json_value(json, '$.id') as json_id,
        json_value(json, '$.name.first_name') as first_name,
        json_value(n, '$.name') as addresses_name,
        json_value(n, '$.city') as addresses_city,
        json_value(n, '$.state') as addresses_state,
        json_value(n, '$.country') as addresses_country
   from test.json_parsing_example,
   unnest (json_query_array (json, '$.addresses')) n;

Row	json_id	first_name	addresses_name	addresses_city	addresses_state	country
1	1	john	home	oakland	california	usa
2	1	john	work	denver	colorado	usa

Roles and permissions

The minimum role required for Fivetran to function is BigQuery User. This role provides permissions to:

run, enumerate, and cancel our own jobs (updates)
read, query, and enumerate all datasets within a project
create new datasets within the project and grant the bigquery.dataOwner role for these new datasets

These permissions are required because Fivetran creates its own datasets and modifies them as changes are made to the source. They don't allow Fivetran to modify datasets it does not own.

If there are some datasets you don't want Fivetran to access, we recommend that you create a separate BigQuery project for them.

For more information, see BigQuery's documentation.

Data load costs

Fivetran uses SQL queries to merge data into the tables in your destination. Because BigQuery bills for query usage, they charge you when Fivetran loads data in your destination. Partitioned tables significantly reduce costs because they allow Fivetran to quickly identify where to insert data in a table without scanning every row.

Column names

Fivetran ignores the case of column names in your destination tables, as BigQuery is case-insensitive. If your database connection is configured to preserve source naming, we sync the column names from the source without any modification. However, BigQuery does not support duplicate column names, even if the case is different.

BigQuery only supports column names that contain letters (a-z, A-Z), numbers (0-9), or underscores (_). We skip the columns that don't comply with this convention.

Hidden datasets

In BigQuery, a hidden dataset is a dataset whose name begins with an underscore (_). You cannot view the hidden datasets on BigQuery's UI. However, you can query the tables and views in hidden datasets the same way you would in any other dataset.

If your source connection's schema name starts with an underscore (for example, _hubspot_test), then the schema or dataset will be a hidden dataset in your BigQuery destination.

For more information about hidden datasets, see BigQuery's documentation.

Transaction concurrency

If a transaction mutates (updates or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. BigQuery cancels the concurrent jobs. For more information, see BigQuery's documentation.

To avoid sync failures, make sure no job runs on the datasets and tables while Fivetran is syncing data into them.

Limitations

Column limitations

BigQuery allows a maximum of 10,000 columns per table.

Partition modification limitations

BigQuery limits the maximum number of partition modifications. When we detect a weak primary key in a table, we update the data in the destination table using the following steps:

Create a copy table that contains all the data of the main table.
Delete the rows that are present in the staging table
Copy with append all the data from the staging table to the copy table.
Copy with truncate all the data from the copy table to the main table.
If the main table is a column partitioned table, the copy with truncate operation truncates the destination table and copies all data from the copy table to the main table. The operation modifies all the partitions in the main table.

If the sync fails with the QuotaExceededException error, we delete the main table and then copy it back from the copy table to reset the partition quota for the day. You can also resolve this error by:

Increasing your BigQuery quota
Reducing your sync frequency

Unstructured files Beta

We support unstructured files only in our SaaS Deployment model.

By default, we store unstructured files in a GCS bucket within the connection's destination schema. You can also specify the folder name to store the unstructured files. The files are stored in the gs://<gcs-bucket-name>/<optional-folder-name>/<schema-name>/<table-name>/ folder path. For more information, see our Unstructured File Replication feature documentation.