Best Practices for File Source Configurations
Question
What are some best practices for configuring my files source app to get the most out of my File connection and for optimizing my MAR usage?
Environment
File connectors utilizing Merge Mode.
Answer
Prerequisites
- Understand how your source files are generated:
- Determine whether new files are continuously created or if existing files are updated or overwritten.
- Identify whether new files are uniquely named, such as by appending timestamps, or if they share the same name.
- Confirm whether you control the file creation process or if a third party handles it.
- Check whether the files include only new changes since the last creation or refresh, or if they contain a full refresh of all historical and new data.
- Understand your source file structure and columns:
- Review our table and column naming rule set for non-database connectors.
- Ensure that field names and column headers are unique. For example,
ColumnHeader
,column_header
, andcolumn__header
may normalize to the same value in the destination, causing duplicate column warnings.
- Understand supported formats, sync strategies, and configuration options. For more information, see our Files overview documentation.
- Specify the file path carefully. While configuring your connection, provide the most specific value possible in the File Path field to avoid syncing unintended files.
- Manage unwanted files. Remove unnecessary files from your source folders. If you can't remove them, use a regex pattern in the File Pattern field to select only the files you want to sync.
- Understand monthly active rows (MAR) and pricing. Learn how MAR is calculated, how to track your usage, and how to optimize your syncs. For more information, see our pricing documentation.
NOTE: If you can access the data directly from a source application or database, then we recommend collecting the raw data directly from those specific connectors. To learn more about the connectors we support, check Fivetran's connectors list. If you still need to leverage file connectors, see the following recommendations.
Preferred file source configurations
Fivetran recommends the following two approaches to configure your source:
Preferred configuration 1
Files have unique file names, but each file contains only the net-new changes from its previous version. This configuration is optimal for having no repeated MAR because each file is unique and is treated as brand new data.
Preferred configuration 2
Files have the same file name, and each file contains only the net-new changes from its previous version. For optimal usage, in the connection setup form, set the Modified file merge option to append_file.
Non-optimal file source configurations
Fivetran doesn't recommend the following non-optimal use cases:
Case 1
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal as the file contains both the old and new data. You will observe degraded sync performance over time.
Case 2
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). If you set the Modified file merge option to append_file, you will incur increased MAR usage.
Case 3
Files have the same file name, and each file contains only the net-new changes from its previous version. If you set the Modified file merge to upsert_file, you will lose the previous data. Change it to append_file for best results (Preferred configuration 2).
Case 4
Files have unique file names, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal. The Modified file merge option doesn't matter because each file is unique and treated as brand new data.
Considerations
Unlike other best practices, file configuration is a topic where Fivetran prefers to be prescriptive to our users. Consider our preferred file source configurations for the best approaches to configure your file ingestion processes.
If you don't use the ideal approaches, you may experience increased MAR usage, degraded sync performance, or both.
NOTE: Fivetran doesn't detect hard deletes in the source unless you perform a full refresh of the live view.