Best Practices for File Configurations
Question
What are some best practices for configuring my connector and optimizing my MAR usage?
Environment
All file connectors except for Magic Folder connectors.
Answer
Prerequisites
Understand how the files are generated:
How are your files being generated:
- Are brand new files and file names being generated on each update? Or
- Are the file names staying the same with new data being added to these files?
Do the files include:
- Only net-new changes from their last creation/refresh period? Or
- All new and historical data?
Ensure you're familiar with the structure of your source files and their columns. In particular, familiarize yourself with our table and column naming rule set for non-database connectors and ensure your column names or column header names don't repeat in such a way that they would normalize to the same value.
Read our Files overview documentation to understand the file configurations, sync strategies, and sync options we support.
Ensure you haven't set a specific File Type for your connector if your files share a common extension. For example, if you want to sync CSV files, and they all have
.csv
extensions, set the File Type toinfer
.Ensure the File Path you provide is as specific as possible.
Ensure you have removed unwanted files from the source folders.
If you're unable to remove unwanted files from the source folders, ensure you have used a regex pattern in the File Pattern configurations field to exclude them.
Read our pricing documentation to learn how monthly active rows (MAR) are calculated, how to track your MAR, and how to optimize your usage.
NOTE: If you can access the data directly from a source application or database, then we recommend collecting the raw data directly from those specific connectors. To learn more about the connectors we support, check Fivetran's connectors list. If you still need to leverage file connectors, see the following recommendations.
Ideal File Configurations
Fivetran recommends the following two approaches to configure your source:
Ideal Method 1
Files have unique file names, but each file contains only the net-new changes from its previous version. This configuration is optimal for having no repeated MAR because each file is unique and is treated as brand new data.
Ideal Method 2
Files have the same file name, and each file contains only the net-new changes from its previous version. For optimal usage, in the connector setup form, set the Modified file merge option to append_file.
Non-optimal File Configurations
Fivetran doesn't recommend the following non-optimal use cases:
Case 1
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal as the file contains both the old and new data. You will observe degraded sync performance over time.
Case 2
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). If you set the Modified file merge option to append_file, you will incur increased MAR usage.
Case 3
Files have the same file name, and each file contains only the net-new changes from its previous version. If you set the Modified file merge to upsert_file, you will lose the previous data. Change it to append_file for best results (Ideal Method 2).
Case 4
Files have unique file names, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal. The Modified file merge option doesn't matter because each file is unique and treated as brand new data.
Considerations
Unlike other best practices, file configuration is a topic where Fivetran prefers to be prescriptive to our users. Consider our Ideal File Configurations for the best approaches to configure your file ingestion processes.
If you don't use the ideal approaches, you may experience increased MAR usage, degraded sync performance, or both.
NOTE: Fivetran doesn't detect hard deletes in the source unless you perform a full refresh of the live view.