Best Practices for File Source Configurations
Question
What are the most effective ways to configure a File connector to maximize performance and minimize Monthly Active Rows (MAR) costs under the 2025 pricing model?
Environment
File connections using Merge Mode on the 2025 pricing model.
What's new?
| Area | What changed | Why it matters |
|---|---|---|
| Per-connection pricing and Multiple Table support | Each connection (source to destination) now accrues its own MAR-based discount curve. Multiple Table support allows a single connection to sync several logically related tables, pooling MAR volume onto that curve. | Consolidate related datasets into one connection to optimize pricing. |
| Re-sync detection | We detect unchanged rows from the previous sync, even though the modified timestamp is at the file level. We identify a row based on the configured primary key. | Unchanged rows are not counted towards MAR, delivering cost effectiveness without compromising data freshness. |
For more information, see our 2025 Pricing Updates.
Answer
Prerequisites
- Understand how your source files are generated:
- Files are appended, overwritten, or newly created?
- File names are unique (for example, with appended timestamps) or reused?
- Files contain only incremental changes or a full historical refresh?
- Identify Primary Keys - Determine whether each dataset mapped to a table includes fields that can uniquely identify each row (i.e., a natural primary key).
- Specify granular paths and patterns - Provide the most specific File Path and optional regex File Pattern to exclude irrelevant data.
- Understand Monthly Active Rows (MAR) and the pricing model - Learn how MAR is calculated, how to track your usage, and how to optimize your syncs. For more information, see our pricing documentation.
Recommended file source configurations
In all the following configurations, we recommend grouping logically related datasets into a single connection. This approach combines their MAR into one discount curve, which can improve cost efficiency.
Configuration A - Incremental updates to existing records
- Files may contain new records as well as updates to existing records.
- Files do not necessarily contain a complete snapshot of the dataset.
- Primary key used for file processing and load: Upsert file using custom primary key.
- Why: This is the recommended option when records can change over time and you want to preserve the latest state for each record. It enables Fivetran to identify genuinely new or changed rows.
Configuration B - Full snapshot files
- Each file is a complete snapshot of the current dataset.
- Primary key used for file processing and load: Upsert file using custom primary key.
- Use Upsert file using file name and line number only when the file name is reused and you want rows from the updated file to replace rows loaded from the earlier version of that file.
- Why: Upsert file using custom primary key is the recommended option for full snapshots because it preserves record identity across syncs and enables re-sync detection to identify genuinely new or changed rows.
Configuration C - Append-only incremental data
- Each arriving file or file version contains only new rows.
- Previously delivered rows are not updated in later files.
- Primary key used for file processing and load: Append file using file modified time.
- Why: This option is well suited to append-only data such as logs, events, or audit records because it does not incur the overhead required for upserts.
Optimization playbook
Consolidate datasets
Sync logically related datasets through a single connection. This concentrates MAR into one discount curve and can improve cost efficiency.Use Upsert file using custom primary key when records can change
Choose this option when files may contain updates to existing records or when files are full snapshots. This allows the destination to maintain the latest version of each record and helps re-sync detection avoid repeated MAR for unchanged rows.Use Append file using file modified time for append-only data
Choose this option for logs, events, and other append-only workloads. It adds rows to the destination without the additional compute required for upserts.
Considerations
Fivetran prefers to be prescriptive to our users regarding file configuration. Consider our recommended file source configurations for the best approaches to configuring your file ingestion processes.
If you don't use the recommended approaches, you may experience increased MAR usage, degraded sync performance, or both.