BigQuery
BigQuery is Google's serverless, petabyte-scale data warehouse for analytics. With BigQuery, you can run analytics over large amounts of data using standard SQL queries.
NOTE: Fivetran supports BigQuery as both a database connector and a destination.
Fivetran can sync to BigQuery as frequently as every five minutes.
Setup guide
Follow our step-by-step BigQuery setup guide to connect your BigQuery data warehouse with Fivetran.
Private connectivity to BigQuery
Fivetran's Google network is configured with Private Google Access (PGA), which means traffic between Fivetran and BigQuery is always private when Fivetran runs on Google Cloud (GCP). You can ensure a private connection by configuring your Fivetran destination to run on GCP.
Type transformation mapping
The data types in your BigQuery data warehouse follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type | Notes |
---|---|---|
BOOLEAN | BOOLEAN | |
SHORT | INTEGER | |
INT | INTEGER | |
LONG | INTEGER | |
BIGDECIMAL | BIGNUMERIC | For source connectors created before August 4, 2021, we convert BIGDECIMAL to NUMERIC. |
FLOAT | FLOAT | |
DOUBLE | FLOAT | |
LOCALDATE | DATE | |
LOCALDATETIME | DATETIME | |
INSTANT | TIMESTAMP | |
STRING | STRING | |
JSON | STRING | |
BINARY | BYTES |
Fivetran represents nested JSON (including arrays) as STRING columns in BigQuery. This happens with semi-structured data sources (for example: webhooks, MongoDB, JSON files).
Parse JSON data
Use BigQuery's native JSON functions in Standard SQL to retrieve data stored in JSON-formatted strings.
For example, if you have a table in your BigQuery destination TEST.JSON_PARSING_EXAMPLE
(id INT64, json STRING), with the following nested JSON structure:
{
"id":1,
"name": {
"first_name": "john",
"last_name": "smith",
"full_name": "john_smith"
},
"addresses":[
{
"name": "home",
"city": "oakland",
"state": "california",
"country": "usa"
},
{
"name": "work",
"city": "denver",
"state": "colorado",
"country": "usa"
}
]
}
To parse the data:
Query the data. Execute the following commands:
select json_value (json, '$.id') as json_id, json_value (json, '$.name.first_name') as first_name from test.json_parsing_example;
Row json_id first_name 1 1 john Flatten the data. Execute the following commands:
select json_value(json, '$.id') as json_id, json_value(json, '$.name.first_name') as first_name, json_value(n, '$.name') as addresses_name, json_value(n, '$.city') as addresses_city, json_value(n, '$.state') as addresses_state, json_value(n, '$.country') as addresses_country from test.json_parsing_example, unnest (json_query_array (json, '$.addresses')) n;
Row json_id first_name addresses_name addresses_city addresses_state country 1 1 john home oakland california usa 2 1 john work denver colorado usa
Roles and permissions
The minimum role required for Fivetran to function is BigQuery User
. This role provides permissions to:
- run, enumerate, and cancel our own jobs (updates)
- read, query, and enumerate all datasets within a project
- create new datasets within the project and grant the
bigquery.dataOwner
role for these new datasets
These permissions are required because Fivetran creates its own datasets and modifies them as changes are made to the source. They don't allow Fivetran to modify datasets it does not own.
If there are some datasets you don't want Fivetran to access, we recommend that you create separate BigQuery project for them.
For more information, see BigQuery's documentation.
Data load costs
Fivetran uses SQL queries to merge data into the tables in your destination. Because BigQuery bills for query usage, they charge you when Fivetran loads data in your destination. Partitioned tables significantly reduce costs because they allow Fivetran to quickly identify where to insert data in a table without scanning every row.
Column names
Fivetran ignores the case of column names in your destination tables as BigQuery is case-insensitive. If your database connector is configured to preserve source naming, we sync the column names from the source without any modification. However, BigQuery does not support duplicate column names, even if the case is different.
BigQuery only supports column names that contain letters (a-z, A-Z), numbers (0-9), or underscores (_). We skip the columns that don't comply with this convention.
Hidden datasets
In BigQuery, a hidden dataset is a dataset whose name begins with an underscore (_
). You cannot view the hidden datasets on BigQuery's UI. However, you can query the tables and views in hidden datasets the same way you would in any other dataset.
If your source connector's schema name starts with an underscore (for example, _hubspot_test
), then the schema or dataset will be a hidden dataset in your BigQuery destination.
For more information about hidden datasets, see BigQuery's documentation.
Transaction concurrency
If a transaction mutates (updates or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. BigQuery cancels the concurrent jobs. For more information, see BigQuery's documentation.
To avoid sync failures, make sure no job runs on the datasets and tables while Fivetran is syncing data into them.
Limitations
Column limitations
BigQuery allows a maximum of 10,000 columns per table.
Partition modification limitations
BigQuery limits the maximum number of partition modifications. When we detect a weak primary key in a table, we update the data in the destination table using the following steps:
Create a copy table that contains all the data of the main table.
Delete the rows that are present in the staging table
Copy with append all the data from the staging table to the copy table.
Copy with truncate all the data from the copy table to the main table.
NOTE: If the main table is a column partitioned table, the copy with truncate operation truncates the destination table and copies all data from the copy table to the main table. The operation modifies all the partitions in the main table.
If the sync fails with the QuotaExceededException
error, we delete the main table and then copy it back from the copy table to reset the partition quota for the day. You can also resolve this error by:
- Increasing your BigQuery quota
- Reducing your sync frequency