Generic S3-Compatible Storage Setup Guide Private Preview
Follow our setup guide to connect your S3-Compatible Storage bucket to Fivetran.
Prerequisites
To connect your S3-Compatible Storage bucket to Fivetran, you need an S3-Compatible Storage bucket containing files with supported file types and encodings.
Setup instructions
Begin Fivetran configuration
- In the connection setup form, enter the Destination schema of your choice.
- Enter the Table group name.
Connect
Enter your S3-Compatible Storage Bucket name.
Enter your S3-Compatible Storage Endpoint URL.
In the Access approach drop-down menu, select one of the following options:
Access Key and Secret: Give Fivetran access to your S3-Compatible Storage bucket using your Access Key ID and Access Key Secret. If you select this option, enter the Access Key ID and Access Key Secret below.
Public Bucket: Give Fivetran access through a public bucket. Your S3-Compatible Storage account does not need special permissions to sync public buckets.
(Optional) Click Run connection test to validate the login credentials and connection to the S3-Compatible Storage bucket.
You can skip this intermediate test and proceed to the next step. However, if you choose to skip, we will perform this test once you have finished your configuration.
Finish Fivetran configuration
Configure files
(Optional) Base folder path - Use the folder path to specify a portion of the file system in which you'd like Fivetran to look for files. We examine files under the specified folder and all of its nested subfolders for files we can sync. If you don't provide a prefix, we'll look through the entire file system for files to sync.
Files - Click + Add files to specify a destination table and the corresponding file name pattern. You can add multiple table to pattern pairs.
Table name - Use names that are unique across all S3-Compatible Storage connections within the same destination schema.
(Optional) File Pattern - Use a regular expression as the file pattern to determine whether to sync specific files. The pattern you specify applies to everything under the prefix (base folder path). If you're unsure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix.
For example, if under the prefix you have a folder
data
, which has sub-folders,subFolder1
,subFolder2
, etc. These sub-folders have JSON files with the formatreport_03/12/2050.json
. Use the following regex patters to decide whether or not to sync specific files:data/.*
matches all files in the data folder, including those in subfolders.data/.*json
matches all JSON files in the data folder, including those in subfolders.data/subFolder2/report_.*\.json
matches all the JSON files in thesubFolder2
folder that has a name that starts with the prefixreport_.
. For example,report_file.json
.report_\d{2}/\d{2}/\d{4}\.json
matches all the JSON files that begin with the prefixreport_
and are followed by a date format ofDD/MM/YYYY
orMM/DD/YYYY
. For example,report_03/12/2050.json
.We recommend that you test your regex.
(Optional) Click Preview Files to validate the file pattern.
You can skip this intermediate test and proceed to the next step. However, if you choose to skip, we will perform this test once you have finished your configuration.
Click Save.
Compression - If your files are compressed but do not have extensions indicating the compression method, you can decompress them according to the selected compression algorithm. If all of your compressed files are correctly marked with a matching compression extension (.bz2, .gz, .gzip, .tar, or .zip), you can select infer. If you select uncompressed, we do not decompress the files and sync the uncompressed files. If you choose a compression format, we decompress every file using the format you select. For example, if you have an automated CSV output system that GZIPs files to save space but saves them without a .gzip extension, you can set this field to GZIP. We will decompress every file that we examine using GZIP.
(Optional) Archive Folder Pattern - Use a regular expression to filter and sync files from archived folders. We sync the files in compressed archives with filenames matching the specified pattern. If there are multiple files within archive (TAR or ZIP) folders, you can use the archive folder pattern to filter file types. For example, if you specify the archive folder pattern as
.*json
, we will sync only the files that end in a .json file extension from the archive folder.This is only used to filter out the files inside the archived folder. All the files matching the File Pattern will be listed.
Format
File Type - We process all files as the selected file type. Use the File Pattern field to select the file extensions you want to sync.
If your file type is XML, we load your XML data into the
_data
column without flattening it.If your file type is CSV, TSV, or log, then enter the following details:
- (Optional) Delimiter - Specify the delimiter used in your CSV file. If your CSV file uses a custom delimiter, replace the default comma
,
with your specific delimiter. For example, if your file is tab-delimited, enter\t
, or if it’s pipe-delimited, enter|
. If you leave this field blank, we’ll attempt to detect the delimiter for each file automatically. However, note that automatic detection may not work in all cases. If your files sync with an incorrect number of columns or use a unique delimiter, consider specifying the delimiter. You can store files with different delimiters in the same folder. For more details on how delimiter inference works, see our documentation. - Quote character - Typically CSVs use double quotes
"
to enclose a value. Set the toggle to off if you don’t want to use an enclosing character. - Non-Standard escape character - Set the toggle to ON if your CSV generator uses non-standard ways of escaping characters like newline, delimiter, etc. Not standard in CSVs.
- Null Sequence - Set the toggle to ON if your CSVs use a special value indicating null. Specify the value indicating null only if you are sure your CSVs have a null sequence. Typically, CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as
\N
to represent null. - Skip Header Lines - Use this option to skip over a fixed number of header lines at the beginning of your CSV files. Set the toggle to ON, and then in the Number of skipped header lines field, specify the number of header lines you want to skip.
- Skip Footer Lines - Use this option to skip over a fixed number footer lines at the end of your CSV files. Set the toggle to ON, and then in the Number of skipped footer lines field, specify the number of footer lines you want to skip.
- Headerless files - Set the toggle to ON if your CSV-generating software doesn't provide a header line. Fivetran can generate generic column names and sync data rows with them.
- Line Separator - Line separators are used in CSV files to separate one row from the next. By default, we use the new line character
\n
as the line separator. If you use a different line separator for your CSV files, replace\n
with your custom line separator.
If your file type is JSON or JSONL, then select the following:
JSON Delivery Mode - Use this option to choose how Fivetran should handle your JSON data.
- If you select Packed, we load all your JSON data into the
_data
column without flattening it. - If you select Unpacked, we flatten one level of columns and infer their data types.
If your file type is XLS/XLSX/XLSM, then enter the following details:
If you have selected xls/xlsx/xlsm as the file type, you must select the top-left cell of the spreadsheet that you want to sync. The connection setup form then requests you to identify a sample file you would like to sync. We analyze and identify eligible data sets. To determine the cell reference correctly, do the following:
- In the Spreadsheet to find data to be synced field, enter the path from the root folder of one of your Excel files.
- Click Analyze sheet.
- In the Cell Reference for Syncs drop-down menu, select the cell reference.
- (Optional) Delimiter - Specify the delimiter used in your CSV file. If your CSV file uses a custom delimiter, replace the default comma
Primary Key used for file process and load - Use this option to let Fivetran know how you'd like to update the files in your destination. When you modify a previously synced file, the option you select determines if we should replace the rows in the destination table or append new rows to the table:
- If you select Upsert file using file name and line number, we will upsert your data using the surrogate primary keys
_file
and_line
. If a file has a unique name, we will sync the data for that file as new data. - If you select Append file using file modified time, we will upsert your files using surrogate primary keys
_file
,_line
, and_modified
. You can track the full history of a file or set of files and your files will have a combination of old and new data or data that is updated periodically. - If you select Upsert file using custom primary key, you can keep the most recent version of every record and your files will have a combination of the old and new data or data that is updated periodically. You can choose the primary keys you want to use after you save and test.
You can modify the primary keys only if your initial sync fails. If your initial sync is successful, the option to modify the primary keys is not available.
- If you select Upsert file using file name and line number, we will upsert your data using the surrogate primary keys
Additional options
Error Handling - Use the error handling option to choose how to handle errors in your files. If you know that your files contain some errors, you can choose to skip poorly formatted lines.
If you select skip, we ignore improperly formatted data within a file, allowing you to sync only valid data.
If you select fail, we fail the sync with an error on finding any improperly formatted data.
We recommend that you select fail unless you are sure that you have undesirable, malformed data.
You will receive a notification on your Fivetran dashboard if we encounter errors.
(Optional) PGP Encryption Options - Use this option to sync PGP encrypted files. Set the toggle to ON and specify the following:
- PGP Private Key - Upload the PGP secret key as an attachment.
- (Optional) Passphrase - Enter the passphrase you used to generate the key.
- (Optional) Signer's Public Key - Upload the signer's public key as an attachment. This key is used for verifying the files.
For PGP decryption processes, we strictly comply with the RFC4880 standard. We support syncing only base64 encoded files.
(Optional) List Strategy - Select the listing strategy you want to use:
Complete Listing - The default option, where we list all the new and modified files from the bucket.
Time-Based Pattern Listing - You can opt to use this strategy if your files are named based on the date or time they are added to the bucket. If you add new files in lexicographic order to the bucket, in each sync, we try to identify a time-based pattern. We only list and sync the files that are lexicographically greater than the last file synced in the previous sync.
If we are unable to identify a time-based pattern, we use the default option.
(Hybrid Deployment only) If your destination is configured for Hybrid Deployment, the Hybrid Deployment Agent associated with your destination is pre-selected in the Select an existing agent drop-down menu. To use a different agent, select the agent of your choice, and then select the same agent for your destination.
Click Save & Test. Fivetran will take it from here and sync your data from your S3-Compatible Storage bucket.
Fivetran tests and validates the S3-Compatible Storage connection. On successful completion of the setup tests, you can sync your S3-Compatible Storage data to your destination.
Setup tests
Fivetran performs the following S3-Compatible Storage connection tests:
The Finding tables test validates wheather you have specified at least one table in the files field to set up the connection.
The Validating Bucket Name test validates the bucket name you specified in the setup form and checks the bucket name to ensure that it does not contain any prefix or folder path characters.
The Connecting to Bucket test validates the connection and checks the accessibility of your S3-Compatible Storage bucket.
The Validating regex file patterns test validates the file pattern regex for each of the tables you specified in the setup form.
The Validating Archive Pattern test validates the archive pattern regex you specified in the setup form. We perform this test only if you specify a regex in the Archive Folder Pattern field.
The Validating EscapeChar test validates the escape character you specified for your CSV files and checks the length of the character which must not be more than one. We perform this test only if you set the Non-standard character escaping? toggle to ON and specify an escape character in the Escape Character field.
The Validating Infer FileType test validates whether
infer
is added as a value in thefile_type
parameter for connections created using API. We perform this test only if you have set up your connector using API.The Finding Matching Files test checks if the connector can successfully retrieve a minimum of one sample file and a maximum of five sample files for each of the tables you specified in the setup form.
The PGP Support test validates whether the connector can successfully retrieve a minimum of one sample file and a maximum of ten sample files from the S3-Compatible Storage bucket and decrypt them using the PGP keys you uploaded. We perform this test only if you set the PGP Encryption Options toggle to ON.
The Multi-Character Delimiter Support test validates the length of the delimiter which must be within 15 characters. We perform this test only if you specify the delimiter for your CSV files in the Delimiter field.
The tests may take a couple of minutes to complete.
Related articles
description Connector Overview
settings API Connection Configuration