Best Practices for File Configurations
Use Case
You want to sync data using Fivetran’s File connectors but aren't aware of the best approaches to configure your syncs and optimize your MAR usage.
NOTE: The following recommendations and best practices are not applicable for Fivetran's Magic Folder connectors.
Recommendations
Prerequisites
An understanding of how the files are generated:
How are your files being generated:
- Are brand new files and file names being generated on each update? Or
- Are the file names staying the same with new data being added to these files?
Do the files include:
- Only net-new changes from their last creation/refresh period? Or
- All new and historical data?
Read our Files documentation to understand the supported options (file formats, compression, encodings), sync strategies, and configuration options (folder path, file pattern, file type, and error handling).
IMPORTANT: With these best practices, you can minimize your chances to incur more MAR than what is required to ingest the data, resulting in less consumption. Other benefits include faster data transfers and lower storage consumption in your destination.
Read our pricing documentation to learn how monthly active rows (MAR) are calculated, how to track your MAR, and how to optimize your usage.
NOTE: If you can access the data directly from a source application or database, then we recommend collecting the raw data directly from those specific connectors. To learn more about the connectors we support, check Fivetran's connectors list. If you still need to leverage file connectors, see the following recommendations.
Ideal File Configurations
Fivetran recommends the following two approaches to configure your source:
Ideal Method 1
Files have unique file names, but each file contains only the net-new changes from its previous version. This configuration is optimal for having no repeated MAR because each file is unique and is treated as brand new data.
Ideal Method 2
Files have the same file name, and each file contains only the net-new changes from its previous version. For optimal usage, in the connector setup form, set the Modified file merge option to append_file.
Non-optimal File Configurations
Fivetran doesn't recommend the following non-optimal use cases:
Case 1
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal as the file contains both the old and new data. You will observe degraded sync performance over time.
Case 2
Files have the same file name, and each file is a complete refresh of its previous version (inclusive of old and new data). If you set the Modified file merge option to append_file, you will incur increased MAR usage.
Case 3
Files have the same file name, and each file contains only the net-new changes from its previous version. If you set the Modified file merge to upsert_file, you will lose the previous data. Change it to append_file for best results (Ideal Method 2).
Case 4
Files have unique file names, and each file is a complete refresh of its previous version (inclusive of old and new data). This approach is not optimal. The Modified file merge option doesn't matter because each file is unique and treated as brand new data.
Considerations
Unlike other best practices, file configuration is a topic where Fivetran prefers to be prescriptive to our users. Consider our Ideal File Configurations for the best approaches to configure your file ingestion processes.
If you don't use the ideal approaches, you may experience increased MAR usage, degraded sync performance, or both.
NOTE: Fivetran doesn't detect hard deletes in the source unless you perform a full refresh of the live view.