How to Manage Usage Costs for Merge and Update Queries
Question
How do I manage my usage costs? Why is Fivetran using large MERGE
queries that impact my data load costs? Queries such as:
MERGE `connector`.`event` AS `#existing` USING `fivetran_name_staging`.`temp_table_name_here`
Environment
Answer
Fivetran uses MERGE
and UPDATE
queries in your destination for data de-duplication.
To reduce your BigQuery usage costs, do the following:
Set your connector's sync frequency to as infrequent as you can while still meeting your business needs. For example, a connector syncing every five minutes will have a much higher destination query cost than one syncing every six hours as the merge and update queries run for every incremental sync.
Optimize (clean) tables in your source. If you are using a database connector, remove old records from the source so that you don't sync the old data if you trigger a historical or table re-sync.
If your connector supports selecting specific tables in the Schema tab, deselect unnecessary tables or endpoints from the sync. Some database connectors allow you to split large tables into individual connectors. You can set a different sync frequency for the larger tables.
Use BigQuery to partition your larger tables so that our
MERGE
queries scan less overall data. For more information about BigQuery partitioning, see the following resources:
Cause
Fivetran uses SQL queries to merge data into the tables in your BigQuery data warehouse. BigQuery bills for query usage and charges you when we load data in your destination.
We use MERGE
and UPDATE
queries for data de-duplication. We pull incremental updates from the source. However, we need to de-duplicate the data by referencing it against the complete data in the destination to ensure data integrity. Our sync process creates temporary tables for this purpose, and BigQuery counts queries to these temporary tables as usage. For more information, see our data load costs documentation.