BigQuerylink
BigQuery is Google's serverless, petabyte-scale data warehouse for analytics.
Fivetran can sync to BigQuery as frequently as every five minutes.
Setup requirementslink
To connect your BigQuery data warehouse to Fivetran as a destination, you need:
- A BigQuery account or a Google Apps account
Setup instructionslink
You need to grant Fivetran access to your BigQuery cluster so we can create and manage tables for your data, and periodically load data into those tables.
-
Go to your Google Cloud Console and go to your projects list.
-
Find your Project ID, and make a note of it. You will need it to fill in your Fivetran setup form.
-
In another tab, go to your Fivetran destination setup page.
-
Select I already have a destination and select BigQuery.
-
Enter the Project ID you noted in step 2.
-
Take note of the Fivetran service account. You will need to grant it permissions in BigQuery.
-
Go back to your BigQuery tab, and go to the project members list.
-
Select + Add.
-
In the New Members field, enter the Fivetran service account that you copied in step 6. The service account is the entire email address.
-
Click Select a role -> BigQuery -> BigQuery User.
The minimum role required for Fivetran to function is BigQuery User
. This role provides permissions to:
- run, enumerate, and cancel our own jobs (updates)
- enumerate all datasets within a project
- create new datasets within the project and grant the bigquery.dataOwner role for these new datasets
These permissions are required because Fivetran creates its own datasets and modifies them as changes are made to the source. These permissions do not allow Fivetran to modify datasets it does not own. Use a separate project for datasets that you do not want Fivetran to be able to query.
VPC service perimeter configuration (Optional)link
If you use a service perimeter to control access to BigQuery, you must set up a GCP bucket to ingest data from Fivetran.
Note: The bucket must be present in the same location as the dataset location.
Set bucket access control
In the GCP console, set the access control for the bucket to Fine-grained.
IMPORTANT: If you set the access control for the bucket to Uniform bucket-level, the connector setup tests will fail with the
Cannot insert legacy ACL for an object when uniform bucket-level access is enabled
error.
Assign permissions to Fivetran service account
The Fivetran service account in the setup form must be given Storage Object Admin permission for the bucket so that it can read and write the data from the bucket.
-
In the Fivetran BigQuery setup form, enter the name of the bucket you want to use.
-
In your Google Cloud Console, go Storage -> Browser to see the list of buckets in your current project.
-
Select the bucket you want to use.
-
Go to Permissions and then click Add Members.
-
In the Add members modal, enter the Fivetran service account you got from the Fivetran BigQuery setup form.
-
From the Select a role dropdown, select Storage Object Admin.
Set the life cycle of objects in the bucket
You must set a lifecycle rule so that data older than one day is deleted from your bucket.
-
In your Google Cloud Console, go Storage -> Browser to see the list of buckets in your current project.
-
In the list, find the bucket you are using for Fivetran, and in the Lifecycle rules column, select its rules.
-
Click Add rule. A detail view will open.
-
In Select object conditions, select Age and enter 1.
-
Click Continue.
-
In Select action, select Delete.
-
Click Continue and then click Save.
Type transformation mappinglink
The data types in your BigQuery warehouse follow Fivetran's standard data type storage. We use the following data type conversions:
Fivetran Data Type | Destination Data Type |
---|---|
BOOLEAN | BOOLEAN |
SHORT | INTEGER |
INT | INTEGER |
LONG | INTEGER |
FLOAT | FLOAT |
DOUBLE | FLOAT |
BIGDECIMAL | NUMERIC |
LOCALDATE | DATE |
INSTANT | TIMESTAMP |
LOCALDATETIME | DATETIME |
STRING | STRING |
JSON | STRING |
BINARY | BYTES |
Fivetran represents nested JSON (including arrays) as STRING columns in BigQuery. This happens with semi-structured data sources (for example: webhooks, MongoDB, JSON files). The easiest way to query these columns is to define a user-defined function (UDF) to parse the text. Here is an example of this pattern. For different use cases, change the return type to match the data being parsed.
create temporary function structure(json string)
returns array<struct<x int64, y int64>>
language js as """
return JSON.parse(json)
""";
Example using this UDF:
with original_table as (select '[{"x":1,"y":1}, {"x":2,"y":2}]' as json_text),
structured_view as (select structure(json_text) as json_array from original_table)
select * from structured_view
More complicated UDFs can pick specific deeply nested fields and flatten arrays.
Data load costslink
Fivetran uses SQL queries to merge data into the tables in your warehouse. Because BigQuery bills for query usage, they charge you when Fivetran loads data in your warehouse. Partioned tables significantly reduce costs because they allow Fivetran to quickly identify where to insert data in a table without scanning every row.