Fileslink
We offer several options to sync files to your destination. You can:
- Add a Cloud-based Storage connector to sync files from your storage bucket
- Configure a Magic Folder connector to sync files from your cloud folder
- Transfer files using File Transfer Protocols
- Connect to Google Sheets
- Email files
Cloud-based storagelink
Sync files from a cloud-based storage service to your destination. Fivetran supports syncing files from the following cloud-based storage services:
Magic Folderlink
Sync any supported file from the cloud folder of your file sync and storage service, to your destination. Magic Folder connectors offer dual mode sync strategies. You can opt to:
- sync each file as a unique table in your destination (Magic Folder Mode)
- sync multiple files into a single destination table that you specify (Merge Mode)
Fivetran supports syncing files from the following cloud folders:
We sync all files in the cloud folder to your destination, even if someone without a Fivetran account uploaded them. Anyone with write access to the cloud folder can drop files into the folder; Fivetran automatically syncs the files from your cloud folder to your destination. For example, if you set up a Dropbox connector in Magic Folder Mode, anyone who can write files to the Dropbox folder can upload a file, and we will sync the file to your destination.
Magic Folder Modelink
Using Magic Folder Mode, you can sync files as unique tables within a schema in your destination.
You can use your cloud folder's URL to configure the connectors. Fivetran automatically syncs the files from your folder as unique tables to your destination. That is, each file in your source becomes a table in your destination.
Fivetran supports syncing multiple worksheets from your spreadsheet (Microsoft Excel and Google Sheets) files as unique destination tables.
Fivetran detects the changes you make to the existing files in your cloud folder and syncs and updates your destination tables.
Magic Folder Mode doesn't support incremental syncs. We re-import the files during every sync. We use the last modified date of the files to detect changes in the files of your cloud folder. If we find files of the same name but with different extensions in the folder, we process the file we encounter first.
Merge Mode Betalink
Using Merge Mode, you can sync multiple files into a single destination table that you specify in the setup form, instead of unique tables for each file.
We sync all the rows from all files within a folder and nested folders into a single table.
Merge Mode supports incremental syncs.
NOTE: The OneDrive connector doesn't support Merge Mode.
File transfer protocolslink
Sync files from your FTP or SSH server to your destination. Fivetran supports syncing files using:
Sync overviewlink
Cloud-based storage connectorslink
Our integration with storage services is optimized for buckets that receive periodic, automatic data dumps. These could be internal reports generated by your own tools, or APIs that we haven't yet offered full support for.
File storage connections occur at the table level. Each Fivetran connector you create (shown below in the blue connection icon circle) connects to a single storage bucket, and its underlying data is loaded into a single destination table. Any files added to that bucket in the specified folder will be pulled and loaded into the specified destination table.
You can configure multiple storage connectors such that they sync into a single schema. All file storage cloud connections are pull integrations β Fivetran periodically pulls new or changed data from the source bucket.
Naminglink
You can assign custom schema and table names while building your connector. Column names are generated based on the uploaded files. For a CSV, there will be one column for each entry in the first line. For a JSON file, there will be one column for each top level field in the JSON object.
If you change the name of a file in your bucket, all data in that file will be re-imported, which may result in duplicate values within that table.
Primary keyslink
Fivetran adds the below three columns as composite primary keys for a table. _file
and _line
are composite primary keys for all configuration options. The column _modified
is an additional primary key that is added if the Modified File Merge option is set to the append_file
.
COLUMN | DATA TYPE |
---|---|
_file π | TEXT |
_line π | INT |
_modified π | TIMESTAMP |
Excluding source datalink
Inclusion and exclusion of source data in file storage integrations are done via specification of a folder path. The integration will sync all files under a certain folder path, even if they are themselves in nested subfolders of the specified folder. Excluding fields from source files is not currently supported.
Updating datalink
After the initial load of historical data, Fivetran only pulls incremental updates of any new or modified files from the source, adding an extra column to your tables called fivetran_synced (TIMESTAMP) that indicates the start time of the job that synced this row.
We do not recommend manually changing the contents of your files. Doing so will result in the following behaviors:
The integration has file-level granularity. If you modify a file, we update the table in the destination to match the contents of the file. Adding rows to a CSV that has already been synced will change the last_modified date, which will then trigger a re-sync of the contents of that file. The file name and line numbers will be used as primary keys. Whatever you change in your file will be represented in the file in the destination, but you may lose the previous version of the file that corresponded to the old record.
Frequency of updateslink
By default, all storage services sync new and modified files every 15 minutes. Depending on the size of each update or the number of tables, it may take longer. In that case, the integration will update at the next 15-minute interval, i.e.
Update X Start: 9:00 am
Update X Finish: 9:18 am
Update Y Start: 9:30 am
You can change the update interval in the Dashboard. A storage service sync that encounters an error will re-try periodically after the shorter of the sync frequency or 1 hour.
Deleted datalink
We handle deleted data based on last modified dates, and we treat deleted files (and columns) and deleted rows differently.
If you delete a file in your source bucket, it no longer exists, so it no longer has a last modified date. Therefore, the deletion is not reflected in your destination - so the data from the deleted file still remains in your destination.
Deleted columns result in a change in the last modified date, and their values are synced as null in the destination.
Deleted rows have a last modified date, so they are deleted from your destination.
Magic Folder connectorslink
Magic Folder Modelink
If you select the Magic Folder Mode, we use the file name (without the extension) as the destination table name. For example, we sync a file named account_details.csv
as the ACCOUNT_DETAILS
table.
For spreadsheet files, we use a combination of the file name and worksheet name as the destination table name. For example, two worksheets called Sheet1
and Sheet2
in a spreadsheet My Workings
are synced as MY_WORKINGS_SHEET_1
and MY_WORKINGS_SHEET_2
tables. We use the values present in the first row of the worksheet as the column names in the destination table.
For more information about the schema and table naming rule set, see our naming conventions documentation.
After the first successful sync, you can view this mapping information in the Schema tab on your connector dashboard.
Sync strategylink
Magic Folder Mode doesn't support incremental syncs. We use the last modified date of the files to detect changes in the files of your cloud folder.
We do the following:
- If you add a new file to your cloud folder, we sync the file as a new table in your destination.
- If you modify a file in your cloud folder, we sync the file's content again to the destination table. We delete the previous data of the table and don't capture deletes.
- If you delete a file from your cloud folder, we don't delete the table from your destination.
If we find files of the same name but with different extensions in the folder, we process the file we encounter first. For example, if you add a file called abc.csv
in your cloud folder and then add another file called abc.json
, we create the table abc
for the file that we encounter first. After that, we wonβt sync any file with the name abc
with other extensions, because the abc
table will be already present.
Schema viewlink
The Schema tab of a connector in Magic Folder Mode does not include a navigation menu, Collapse all button, or Expand all button.
The Schema tab of a connector in Magic Folder Mode is empty if you haven't uploaded any supported files.
Primary keyslink
Fivetran adds a single column as a primary key for each table.
COLUMN | DATA TYPE |
---|---|
_line π | INT |
Sync limitationslink
We do not sync files inside nested folders in your cloud folder. We do not sync empty folders or unsupported file types.
We don't sync empty worksheets. We don't sync an Excel worksheet if the entire first row of the worksheet is empty. We don't sync a column if its first row is empty.
We don't sync worksheets with pivot tables.
You can find more information on the individual connector pages.
Merge Modelink
If you select the Merge Mode, we use the table name that you specify as the destination table name.
The connector connects to a single cloud folder, and its underlying data is loaded into a single destination table. Any files added to that specific folder will be pulled and loaded into the specified destination table.
You can configure multiple connectors such that they sync into a single schema. Fivetran periodically pulls new or changed data from the cloud folder.
Sync strategylink
Merge Mode supports incremental syncs. For more information, see the Sync overview section of our cloud-based storage connectors.
Sync limitationslink
We don't support syncing spreadsheets.
File transfer connectorslink
File transfer connections occur at the table level. Each connector you create connects to a single file directory, and its underlying data is loaded into a single destination table that you specify. Any files added to that specified directory will be pulled and loaded into the specified destination table.
You can configure multiple connectors such that they sync into a single schema. All file transfer connections are pull integrations -- Fivetran periodically pulls new or changed data from the FTP or SSH server.
For more information, see the Sync overview section of our cloud-based storage connectors.
Email connectorlink
We extract the attachments from the email, parse the attachment data and upload them into your destination. For more information, see our Email connector documentation.
Google Sheets connectorlink
We connect to your Google Sheet, fetch the data from the designated named range, then create a matching table in your destination and load its corresponding initial data. We then continue to check and sync changes to the named range on the update frequency that you specify. For more information, see our Google Sheets connector documentation.
Supported file formatslink
File Format | Specification | Notes |
---|---|---|
Separated Value Files (CSV*, TSV, etc.) | RFC-4180 | |
JSON text files delimited by new lines | RFC-7159 | |
JSON Arrays | RFC-7159 | |
Avro | Apache Avro 1.8.2 | |
Parquet | Apache Parquet | We support apache-parquet-format 2.8.0 and Row Group Size up to 1.5 GB. |
Excel (XLS, XLSX, and XLSM) | Microsoft Excel | Magic Folder connectors. We don't support files in Strict OOXML and Excel 5.0/95 Binary (BIFF5) formats; make sure to save the files in XLSX format. |
Google Sheets | Google Sheets | Google Drive and Google Sheets connectors. |
XML | XML | We support syncing xml files to your destination in Merge Mode. We load your XML data into a column without flattening it. For more information about the file size, see our documentation. |
IMPORTANT: Use the Google Sheets connector to sync data from a specific named range in your file to a single destination table. Use the Google Drive connector to sync data from multiple sheets within a spreadsheet as unique destination tables.
Supported file compressionlink
TAR and ZIP archives
BZip2 and GZip compression
NOTE: The Magic Folder connectors don't support file compression. We don't sync compressed files to your destination.
Supported encodingslink
- UTF-8, UTF-16, and UTF-32, with big or little endian order
- UTF-8 encoding is assumed if no Byte-Order Mark is present at the beginning of a file
Syncing empty tables and columnslink
The cloud-based storage and file transfer connectors don't support the creation of empty columns in your destination. We create a column in the destination table only if the column in your source file is not empty.
The Magic Folder connectors don't support the creation of empty tables and columns in your destination. We create a column in the destination table only if the column in your source file is not empty.
NOTE: For spreadsheet files, we don't create a destination table if the worksheet is empty. We don't create a table if the entire first row of the worksheet is empty. We create a column only if its first row is not empty.
The Google Sheets connector doesn't support the creation of empty tables and columns in your destination. We don't create a destination table if the Google sheet is empty. We create a column in the destination table only if the column in your source file is not empty.
The Email connector doesnβt support the creation of empty columns in your destination.
Configuration optionslink
IMPORTANT: The following configuration options are applicable only for our cloud-based storage and file transfer connectors.
Using these configuration options, you can select subsets of your folders, certain types of files, and more to sync only the files you need in your destination. Setting up multiple connectors targeted at the same source, but with different options, can allow you to slice and dice a source any way you'd like.
Folder path (optional)link
This folder path is used to specify a portion of the bucket in which you'd like Fivetran to look for files. Any files under the specified folder and all of its nested subfolders will be examined for files we can upload. If no prefix is supplied, we'll look through the entire bucket for files to sync.
File pattern (optional)link
The file pattern is a regular expression that we use to decide whether to sync certain files. It applies to everything under the prefix. For instance, suppose under the prefix logs
you had three folders: 2017
, 2016
, and errors
. Using the pattern \d\d\d\d/.*
, you could exclude all the files in the errors
folder, because \d\d\d\d
only applies to the folders whose name consists of four consecutive digits, and because .*
after /
applies to any files in these folders. If you're not sure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix.
If you're feeling particularly bold, you can learn to write your own regex using Tutorialspoint's Java - Regular Expressions documentation and test it out using the Rubular tool.
TIP: To sync files having file names with a specific string irrespective of the case, use the
?i
modifier in your regex. For example, if you want to match all the files with a 'customer' string in the file name, specify the file pattern as(?i).*customer.*
.
File typelink
The file type is used to let Fivetran know that even files without a file extension ought to be parsed as this file type. For example, if you have an automated CSV output system that saves files without a .csv extension, you can specify the CSV type and we will sync them correctly as CSVs. Selecting "infer" will let Fivetran infer from a file's extension (.csv
, .tsv
, .json
, .avro
, or .log
) what to sync. If you do choose a file type, every file we examine will be interpreted as the file type you select, so make sure everything Fivetran syncs has the same file type!
Fivetran also supports syncing xml files to the destination. Fivetran loads your XML data into the _data
column without flattening it.
JSON Delivery Mode (optional)link
This option is available for JSON or JSONL file types. Fivetran uses this option to choose how to handle your JSON data. Fivetran has two values for this feature, Packed
and Unpacked
. If you select Packed, Fivetran loads all the JSON data into the _data
column without flattening it, or if you select Unpacked, Fivetran flattens one level of columns and infers their data types.
Compressionlink
The compression format is used to let Fivetran know that even files without a compression extension should be decompressed using the selected compression format. For example, if you have an automated CSV output system that GZIPs files to save space, but saves them without a .gzip extension, you can set this field to GZIP. The integration will then decompress every file that it examines using GZIP. If all of your compressed files are correctly marked with a matching compression extension (.bz2
, .gz
, .gzip
, .tar
, or .zip
), you can select "infer".
Error handlinglink
Selecting skip ignores any improperly formatted data within a file, allowing you to sync only valid data. Choosing fail enables you to prevent a file from syncing if any improperly formatted data is detected. With either option, you will receive a notification on your dashboard if errors are encountered.
Advanced optionslink
Use the advanced configuration options for highly specific cases.
Modified file mergelink
When a previously synced file is modified, should the rows in the destination be replaced or should the new rows be appended to the table. upsert_file
option will replace records in the destination, using the filename and line number as the primary key. append_file
option will append records.
NOTE: Select the
upsert_file
option to replicate the file contents to the destination. Select theappend_file
option to track the file modification history.
Archive folder pattern (optional)link
If there are multiple files within archive (TAR or ZIP) folders, you can use the archive folder pattern to filter those as well. For example, the archive folder pattern .*json
will sync from an archive folder only those files that end in a .json file extension.
Null sequence (optional)link
CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as \N
to represent null. Note: text is un-escaped before the null sequence is matched, so don't use the escape character in your null sequence. Only use this field if you are sure your CSVs have a null sequence.
Delimiter (optional)link
The delimiter is a character used in CSV files to separate one field from the next. If this is left blank, Fivetran will infer the delimiter for each file, and files of many different types of delimiters can be stored in the same folder with no problems. If this is not left blank, then all CSV files in your search path will be parsed with this delimiter. You can also specify a multi-character delimiter in this field. A custom multi-character delimiter (excluding "\t" and "\s") should be mentioned only if the source contains only CSV files.
Note: If the field is left blank, we can only infer single character delimiter, not multi-character delimiter. We can infer the delimiter from
,
,;
,|
,\t
, andUnit Separator
. To correctly parse the CSV files having multi-character delimiter, you must mention the delimiter in this field.
Escape character (optional)link
CSVs have a special rule for escaping quotation marks as opposed to other characters - they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use other characters like backslash for escaping. Only use this field if you are sure your CSVs have a different escape character.
Skip header lines (optional)link
Some CSV generating programs include additional header lines or empty lines at the top of the file. They consist of a few lines that do not match the format of the rest of the rows in the file. These header rows can cause undesired behavior in Fivetran because we attempt to parse them as if they were records in your CSV. By setting this value, you can skip fixed-length headers at the beginning of your CSV files.
NOTE: Only use this field if you are sure your CSVs have more than one header line. We will skip these lines before we map the header to row data for the CSVs.
Skip footer lines (optional)link
Some CSV generating programs include a footer at the bottom of the file. It consists of a few lines that do not match the format of the rest of the rows in the file. These footer rows can cause undesired behavior in Fivetran because we attempt to parse them as if they were records in your CSV. By setting this value, you can skip fixed-length footers at the end of your CSV files.
Headerless files (optional)link
Some CSV generating programs do not include column name headers for the files - they only contain data rows. By setting this value, you request Fivetran to generate generic column names following the convention of column_0
, column_1
, ... column_n
to map the rows.
NOTE: Only use this option if you are sure your CSVs do not have the header line, and the column orderings remain the same. Otherwise, it can cause undesired behavior in the schema. For example, if the column order changes in the source, we won't be able to identify column changes. If you have no control of the CSV generator and still want to turn on the empty file option, we recommend always keeping the error handling option "on" to catch potential errors.
Line Separator (optional)link
This field specifies the custom line separator for CSV files. The line separator is used in files to separate one row from the next. If you leave this field blank, Fivetran uses the new line character \n
as the line separator by default. If you specify a line separator, Fivetran parses all the CSV files in your folder path with this line separator.
PGP Encryption (optional)link
Fivetran provides this option to sync PGP encrypted files. Fivetran needs the PGP Private Key and Passphrase credentials to sync the PGP encrypted files. Fivetran decrypts the files using these credentials and syncs them to your destination. This feature is currently available in Amazon S3, Azure Blob Storage, SFTP, and Wasabi Cloud Storage services only.