Staging for BigQuery
While running Burst Integrate or Bulk Refresh, HVR can put data into a temporary directory (‘staging file') before loading data into a target database. For more information about staging files on BigQuery, refer to Google BigQuery documentation.
For best performance, Fivetran HVR performs Burst Integrate and Bulk Refresh into BigQuery using staging files. HVR implements the file staging in BigQuery as follows:
- HVR first writes data into the staging directory on Google Cloud Storage.
- HVR then uses the BigQuery SQL command
insert into
to ingest data into the BigQuery target tables from the staging directory.- To ingest data into BigQuery target, HVR pulls the compressed data from the staging directory (located on Google Cloud Storage) into a target table. A special 'external table' needs to exist for each target table that HVR loads data into. HVR will create these tables in BigQuery with names having the following patterns _ _x or _ _bx.
Configuring Staging
HVR must be configured to stage the data on Google Cloud Storage before loading it into BigQuery. For staging the data on Google Cloud Storage and perform Burst Integrate and Bulk Refresh, the following are required:
A Google Cloud Storage location (bucket) to store temporary data that are to be loaded into BigQuery.
A Google Cloud user (storage account) to access the bucket.
For the permissions required on Google Cloud Storage and Google BigQuery, refer to Google BigQuery documentation.
Define the following location properties (while creating a location or by editing the existing location's source and target properties) for the BigQuery location:
The Location Property equivalent to the UI field is shown inside (bracket) below.
- STORAGE (File_Scheme): Protocol for connecting HVR to Google Cloud Storage.
Select either of the following options:- Google Cloud Storage (HTTPS)
- Google Cloud Storage (HTTP)
- BUCKET (GS_Bucket): Name or IP address of the Google Cloud Storage bucket.
- STAGING DIRECTORY (Staging_Directory): Location where HVR will create the temporary staging files inside the Google Cloud Storage bucket (e.g. /my_staging_dir).
- STORAGE (File_Scheme): Protocol for connecting HVR to Google Cloud Storage.