How to Manage Usage Costs for Merge and Update Queries

Question

How do I manage my usage costs? Why is Fivetran using large MERGE queries that impact my data load costs? Queries such as:

MERGE `connector`.`event` AS `#existing` 
USING `fivetran_name_staging`.`temp_table_name_here`

Environment

BigQuery

Answer

Fivetran uses MERGE and UPDATE queries in your destination for data de-duplication.

To reduce your BigQuery usage costs, do the following:

Set your connector's sync frequency to as infrequent as you can while still meeting your business needs. For example, a connector syncing every five minutes will have a much higher destination query cost than one syncing every six hours as the merge and update queries run for every incremental sync.
Optimize (clean) tables in your source. If you are using a database connector, remove old records from the source so that you don't sync the old data if you trigger a historical or table re-sync.
If your connector supports selecting specific tables in the Schema tab, deselect unnecessary tables or endpoints from the sync. Some database connectors allow you to split large tables into individual connectors. You can set a different sync frequency for the larger tables.
Use BigQuery to partition your larger tables so that our MERGE queries scan less overall data. For more information about BigQuery partitioning, see the following resources:

Cause

Fivetran uses SQL queries to merge data into the tables in your BigQuery data warehouse. BigQuery bills for query usage and charges you when we load data in your destination.

We use MERGE and UPDATE queries for data de-duplication. We pull incremental updates from the source. However, we need to de-duplicate the data by referencing it against the complete data in the destination to ensure data integrity. Our sync process creates temporary tables for this purpose, and BigQuery counts queries to these temporary tables as usage. For more information, see our data load costs documentation.