Google BigQuery as Target
Fivetran HVR supports integrating changes into Google BigQuery location. This section describes the configuration requirements for integrating changes using Integrate and Refresh into BigQuery location. For the list of supported BigQuery versions into which HVR can integrate changes, see Integrate changes into location in Capabilities.
The preferred methods for writing data into BigQuery are using Burst Integrate and Bulk Refresh as they provide better performance. However, it is required to create staging files on a temporary location to perform Burst Integrate and Bulk Refresh. For more information about staging, see section Staging for BigQuery.
Continuous Integrate is not recommended for replication to BigQuery. The default and highly efficient method is Burst Inegrate, as BigQuery is optimized for batch processing. Applying changes one-by-one (Continuous) is significantly less efficient.
We strongly advise against using Continuous Integrate. Not only is it inefficient, but you may also encounter issues with replication. We recommend utilizing Burst Integrate for optimal performance and reliability when replicating to BigQuery.
Multi-statement Transactions
By default, HVR applies changes in BigQuery using auto-commit. The limitation of using auto-commit is that the HVR cannot properly recover if Integrate exits during an integrate cycle, which can result in creating duplicates in the target. To overcome this limitation, you can instead use multi-statement transactions.
To enable multi-statement transactions, you must define the following environment variable:
Action | Parameters |
---|---|
Environment | Name=HVR_BIGQUERY_ENABLE_SESSIONS Value=1 |
When multi-statement transactions are enabled, replication fails if the channel contains more than 99 tables since BigQuery does not support more than 100 tables in the same transaction."
Grants for Integrate and Refresh
This section lists the grants/permissions required for integrating changes into Google BigQuery.
The HVR database user must be granted the following three roles:
These three roles are required for granting the following permissions -
storage.buckets.get
,storage.objects.create
,storage.objects.delete
,storage.objects.get
,storage.objects.list
,bigquery.jobs.create
,bigquery.datasets.get
,bigquery.routines.get
,bigquery.routines.list
,bigquery.tables.create
,bigquery.tables.delete
,bigquery.tables.get
,bigquery.tables.getData
,bigquery.tables.list
,bigquery.tables.updateData
.
Intermediate Directory
This option in the HVR UI allows you to specify a directory path for storing intermediate (temporary) files generated during Compare. These files are created during both "direct file compare" and "online compare" operations.
Using an intermediate directory can enhance performance by ensuring that temporary files are stored in a location optimized for the system's data processing needs.
This setting is particularly relevant for target file locations, as it determines where the intermediate files are placed during the Compare operation. If this option is not enabled, the intermediate files are stored by default in the integratedir/_hvr_intermediate directory, where integratedir is the replication DIRECTORY (File_Path) defined for the target file location.
This option is equivalent to the location property Intermediate_Directory.
Intermediate Directory is Local
This option in HVR UI specifies that the Intermediate Directory will be created on the local drive of the file location's server.
This setting is crucial for optimizing performance, as it reduces network latency and avoids potential permission issues associated with remote storage. By storing intermediate files locally, HVR can process data more efficiently, taking advantage of the speed and reliability of local storage.
This option is particularly beneficial when the HVR Agent has access to ample local storage, enabling it to handle large data volumes without relying on networked storage solutions.
This option is equivalent to the location property Intermediate_Directory_Is_Local.