Amazon S3 Setup Guide
Follow our setup guide to connect your Amazon S3 bucket to Fivetran.
Prerequisites
To connect your Amazon S3 bucket to Fivetran, you need:
- An S3 bucket containing files with supported file types and encodings
- For private or encrypted buckets, an AWS account with the ability to grant Fivetran permission and to read from the bucket
Setup instructions
We recommend disabling Access Control Lists (ACLs) on each S3 bucket so that the bucket contents are controlled by the bucket's access control settings and not the original file owner's settings. For more information about disabling ACLs for your bucket, see Amazon S3 documentation.
Begin Fivetran configuration
In the connector setup form, enter the Destination schema and Destination table name of your choice.
Enter your S3 Bucket name.
IMPORTANT: If you are using an access point, enter the Access Point alias if you already have it or create one using our Configure Access Point instructions.
(Optional) In the Access approach drop-down menu, select one of the following options:
IAM Role (most secure): Give Fivetran access by creating an IAM role using our External ID.
Access Key and Secret: Provide Fivetran an access key and secret for your S3 bucket. You may need to use this method if you don’t own the bucket and its access methods are limited.
Public Bucket: Give Fivetran access through a public bucket. Your AWS account does not need special permissions to sync public buckets. Skip to the Finish Fivetran configuration step.
NOTE: You can use the Access Analyser for S3 to find out if your S3 bucket has public or shared access.
Create IAM policy
IMPORTANT: You must create an IAM policy for both the IAM Role and Access Key and Secret approaches.
NOTE: For encrypted buckets, follow Amazon S3 bucket instructions to modify the AWS KMS key's policy to grant Fivetran permissions to download files from your encrypted bucket.
Open your Amazon IAM console.
Go to Policies, then click Create Policy.
Go to the JSON tab.
Copy the following policy and paste it into the visual editor. Replace
{your-bucket-name}
with the name of your S3 bucket. After that, click Next: Tags.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
(Optional) If you use a customer-managed KMS key, add the following policy to the Action section of the IAM policy to provide read access to the encrypted files.
"Action": [ "kms:Decrypt", "kms:GenerateDataKey" ]
In the Add tags step, you can optionally add custom tags that will be associated with your bucket. Click Next: Review.
In the Review policy step, specify the name of your policy, for example "Fivetran-S3-Access", then click Create policy.
(Optional) Access using IAM role
Find External ID
In the connector setup form, find the automatically-generated External ID and make a note of it. You will need it to create an IAM role in AWS.
NOTE: The automatically-generated External ID is tied to your account. If you close and re-open the setup form, the ID will remain the same. You can keep the tab open in the background while you configure your source for convenience.
Create IAM role
Go to Roles, then click Create role.
Select AWS account, then enter Fivetran’s AWS VPC Account ID,
834469178297
, in the Account ID field.Select the Require external ID checkbox and enter the External ID you found above, then click Next.
In the Add permissions step, select the "Fivetran-S3-Access" policy you created, then click Next.
In the Name, review, and create step, specify the role name, for example "Fivetran", then click Create role at the bottom of the page.
Click the Fivetran role you created.
On the Summary page for the role, find the ARN and make a note of it. You will need it to configure Fivetran.
NOTE: If you want to re-use an existing IAM role created for Fivetran account, you need to edit the trust policy for the same role. You can then add another external ID to the JSON policy or copy the following policy and paste it in your JSON tab:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS" : "arn:aws:iam::834469178297:user/gcp_donkey"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": [
"external-id-1",
"external-id-2"
]
}
}
}
]
}
(Optional) Set permissions
You can specify permissions for the Role ARN that you designate for Fivetran. Giving selective permissions to this role will allow Fivetran to only sync what it has permissions to see.
Skip to the Configure AWS PrivateLink step.
(Optional) Access using key and secret
NOTE: You can skip this step if you already have an access key and secret.
Create user
NOTE: You can skip this step if you already have a user having access to the bucket.
Open your Amazon IAM console.
Go to Users, then click Add users.
Enter the User name, then click Next.
Select the Attach policies directly option, then select the "Fivetran-S3-Access" policy you created.
Click Next, then click Create user.
Generate access key and secret
In the Users tab, open the User you created.
Go to Security credentials tab and navigate to the Access keys section.
Click Create access key.
From the Use Case options, select the Third-party service option and then click Next.
Enter a Description tag value and then click Create access key.
Copy the Access key and Secret access key values. You will need them to configure Fivetran.
(Optional) Configure AWS PrivateLink
IMPORTANT: You must have a Business Critical plan to use AWS PrivateLink.
AWS PrivateLink allows VPCs and AWS-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. PrivateLink is the most secure connection method. Learn more in AWS’ PrivateLink documentation.
Follow our AWS PrivateLink setup guide to configure PrivateLink for your S3 bucket.
NOTE: There are two ways in which you can provide Fivetran access to your data, using IAM policies to control access to S3 buckets(recommended) or using access points.
By default, you cannot configure PrivateLink if you want to use the Hybrid Deployment model. However, if you want to configure PrivateLink with Hybrid Deployment, see the Gateway endpoints for Amazon S3 documentation. With a gateway endpoint, you can access Amazon S3 from your VPC, without an internet gateway or NAT device for your VPC.
(Optional) Configure access point
Create access point
Create an access point to provide Fivetran access to your S3 bucket.
Open your S3 console.
On the left navigation pane, click Access Points.
Select the access point.
Go to the Properties tab. Make a note of the Access Point alias. You will need it to configure Fivetran.
To provide your bucket access to the access point, copy the following into the bucket policy you created. Replace
{account-number}
with your AWS account number,{role-name}
with the role name that you created,{your-bucket-name}
with the S3 bucket name which you used to configure the access point,{access-point-region}
with the AWS region in which you created the access point, and{your-access-point}
with the name of the access point you created.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::{account-number}:role/{role-name}" }, "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}", "arn:aws:s3:::{your-bucket-name}/*" ], "Condition": { "StringLike": { "s3:DataAccessPointArn": "arn:aws:s3:{access-point-region}:{account-number}:accesspoint/{your-access-point}" } } } ] }
Create an IAM policy for access point
Create a new access point policy. Copy the following policy and paste it in the JSON tab. Replace {access-point-region}
with the region in which you created the access point, {account-number}
with your account number, and {your-access-point}
with the name of the access point you created.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:{access-point-region}:{account-number}:accesspoint/{your-access-point}",
"arn:aws:s3:{access-point-region}:{account-number}:accesspoint/{your-access-point}/*"
]
}
]
}
Finish Fivetran configuration
Depending on your Access approach, do the following:
- If you selected IAM Role, enter the Role ARN you created.
- If you selected Access Key and Secret, enter the Access Key ID and Access Key Secret you created.
Choose your configuration options. Using these configuration options, you can select subsets of your folders, specific types of files, and more to sync only the files you need in your destination. In addition, setting up multiple connectors targeted at the same container but with different options allows you to slice and dice a container any way you'd like.
You can use the following configuration options:
Beta Primary Key used for file process and load - Use this option to let Fivetran know how you'd like to update the files in your destination. When you modify a previously synced file, it determines if we should replace the rows in the destination table or append new rows to the table:
- If you select Upsert file using file name and line number, we will upsert your data using surrogate primary keys
_file
and_line
. You'll have files with unique names containing net-new data. - If you select Append file using file modified time, we will upsert your files using surrogate primary keys
_file
,_line
, and_modified
. You can track the full history of a file or set of files and your files will have a combination of old and new data or data that is updated periodically.NOTE: If we upload new files or modify the existing files once the sync starts, we exclude these files from the current sync and include them in the subsequent sync.
- If you select Upsert file using custom primary key, you can keep the most recent version of every record and your files will have a combination of the old and new data or data that is updated periodically. You can choose the primary keys you want to use after you save and test.
NOTE: To ensure data integrity, we recommend that you don't change the primary keys once the initial sync is completed.
- If you select Upsert file using file name and line number, we will upsert your data using surrogate primary keys
(Optional) Folder Path - Use the folder path to specify a portion of the container in which you'd like Fivetran to look for files. We examine files under the specified folder and all of its nested subfolders for files we can sync. If you don't provide a prefix, we'll look through the entire container for files to sync.
(Optional) File Pattern - Use a regular expression as the file pattern to decide whether or not to sync specific files. The pattern applies to everything under the prefix (folder path). If you're unsure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix.
For example, if under the prefix you have a folder
data
, which has sub-folders,subFolder1
,subFolder2
, etc. These sub-folders have JSON files with the formatreport_03/12/2050.json
. Use the following regex patters to decide whether or not to sync specific files:data/.*
matches all the files in thedata
folder.data/subFolder1/.*\.json
matches all the JSON files in thesubFolder1
.data/subFolder2/report_.*\.json
matches all the JSON files in thesubFolder2
folder that has a name that starts with the prefixreport_.
. For example,report_file.json
.report_\d{2}/\d{2}/\d{4}\.json
matches all the JSON files that begin with the prefixreport_
and are followed by a date format ofDD/MM/YYYY
orMM/DD/YYYY
. For example,report_03/12/2050.json
.
TIP: You can learn to write your regex and test it out.
File Type - Use the file type to choose the parsing strategy for files without file extensions. If you save your files with improper extensions, you can force them to be synced as the selected file type.
If you select infer, we infer the type from a file's extension (.csv, .tsv, .json, .avro, or .log).
NOTE: If you have XML files, don't select infer. We sync XML files only when you select the file type as xml. For more information about the file size, see our documentation.
NOTE: If you have PGP encrypted files, do not select infer.
If you choose a file type, we interpret every file we examine as the file type you select, so make sure everything we sync has the same file type.
For example, if you have an automated CSV output system that saves files without a .csv extension, you can specify the type as csv, and we will sync them correctly as CSVs.
If you select xml, we load your XML data into the
_data
column without flattening it.
(Optional) JSON Delivery Mode - Available when JSON or JSONL is selected in File Type. Use this option to choose how Fivetran should handle your JSON data.
- If you select Packed, we load all your JSON data into the
_data
column without flattening it. - If you select Unpacked, we flatten one level of columns and infer their data types.
- If you select Packed, we load all your JSON data into the
Compression - Use the compression option to choose the compression strategy to decompress files without compression extensions. If your files are compressed but do not have extensions indicating the compression method, you can decompress them according to the selected compression algorithm.
If all of your compressed files are correctly marked with a matching compression extension (.bz2, .gz, .gzip, .tar, or .zip), you can select infer.
If you select uncompressed, we do not decompress the files and sync the uncompressed files.
If you choose a compression format, we decompress every file using the format you select.
For example, if you have an automated CSV output system that GZIPs files to save space but saves them without a .gzip extension, you can set this field to gzip. We will decompress every file that we examine using GZIP.
Error Handling - Use the error handling option to choose how to handle errors in your files. If you know that your files contain some errors, you can choose to skip poorly formatted lines.
If you select skip, we ignore improperly formatted data within a file, allowing you to sync only valid data.
If you select fail, we fail the sync with an error on finding any improperly formatted data.
TIP: We recommend that you select fail unless you are sure that you have undesirable, malformed data.
You will receive a notification on your Fivetran dashboard if we encounter errors.
(Optional) To use the advanced configuration options, set the Enable Advanced Options toggle to ON.
You can use the following configuration options for specific use cases:
(Optional) Archive Folder Pattern - Use a regular expression to filter and sync files from archived folders. We sync the files in compressed archives with filenames matching the specified pattern. If there are multiple files within the archive (TAR or ZIP) folders, you can use the archive folder pattern to filter file types.
For example, if you specify the archive folder pattern as
.*json
, we will sync only the files that end in a .json file extension from the archive folder.NOTE: This is only used to filter the files within the archived folder.
(Optional) Non-standard character escaping? - Set the toggle to ON if your CSV generator uses non-standard ways of escaping characters. CSV files must adhere to the rules in RFC-4180.
Character Escaping options - Select the approach your CSV-generating software uses.
- Custom Escape Character - Select this option if your file uses custom escape characters to escape quotation marks. Use this field only if you are sure your CSVs have a different escape character. CSVs have a special rule for escaping quotation marks compared to other characters; they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use different characters like backslash
\
for escaping.- Escape Character - Set the escape character if your CSV generator follows non-standard rules for escaping quotation marks.
- Delimited Only - Select this option if your file doesn't use escape character to escape quotation marks, and you want your file to be processed with delimiter only.
NOTE:
- We recommend that you use the Delimited Only option only when your data doesn't have any quoted fields, and you are confident that the delimiter character is not a part of the field.
- Since we are processing the files with only delimiters, the data type of the columns may change from INTEGER, FLOAT, and LONG to STRING if the numbers are within quotes.
- Custom Escape Character - Select this option if your file uses custom escape characters to escape quotation marks. Use this field only if you are sure your CSVs have a different escape character. CSVs have a special rule for escaping quotation marks compared to other characters; they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use different characters like backslash
(Optional) Delimiter - Specify the delimiter. The delimiter is a character used in files to separate one field from the next. Fivetran tries to infer the delimiter, but in some cases, this is impossible. If your files sync with the wrong number of columns or uses a unique delimiter, consider setting this value. For example, if you have tab-delimited files, you must enter
\t
, and if you have pipe-delimited files, enter|
.If you leave this field blank, we infer the delimiter for each file. You can store files of many types of delimiters in the same folder with no problems. For more information on the delimiter inference, see our documentation.
If you specify a delimiter, we parse all the CSV, TSV, and TXT files in your folder path with this delimiter.
NOTE: You can also specify a multi-character delimiter in this field. A custom multi-character delimiter (excluding "\t" and "\s") should be mentioned only if the source contains only csv files, else it might lead to data integrity issues for other files. The length of custom multi-character delimiter should not exceed 15 characters.
(Optional) Null Sequence - Specify the value indicating null if your CSVs use a special value indicating null.
Only use this field if you are sure your CSVs have a null sequence. CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as
\N
to represent null.(Optional) Skip Header Lines - Use this option to skip over fixed-length headers at the beginning of your CSV files. Set the toggle to ON, and then in the Number of skipped header lines field, specify the number of header lines you want to skip.
Some CSV-generating programs include additional header lines or empty lines at the top of the file. The header consists of a few lines that do not match the format of the rest of the rows in the file. These header rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
(Optional) Skip Footer Lines - Use this option to skip over fixed-length footers at the end of your CSV files. Set the toggle to ON, and then in the Number of skipped footer lines field, specify the number of footer lines you want to skip.
Some CSV-generating programs include a footer at the bottom of the file. The footer consists of a few lines that do not match the format of the rest of the rows in the file. These footer rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
(Optional) Headerless Files - Set the toggle to ON if your CSV-generating software doesn't provide a header line for the documents. Fivetran can generate the generic column names and sync data rows with them.
Some CSV-generating programs do not include column name headers for the files; they only contain data rows. When you set the toggle to ON, we generate generic column names following the convention of
column_0
,column_1
, ...column_n
to map the rows.(Optional) Line Separator - Specify the custom line separator for your CSV files. The line separator is used in files to separate one row from the next.
If you leave this field blank, we use the new line character
\n
as the line separator by default.If you specify a line separator, we parse all the CSV files in your folder path with this line separator.
(Optional) ** Encryption Options** - Use this option to sync encrypted files. Set the toggle to ON and specify the following:
- PGP Private Key - Upload the PGP secret key as an attachment.
- (Optional) Passphrase - Enter the passphrase you used to generate the key.
- (Optional) Signer's Public Key - Upload the signer's public key as an attachment. This key is used for verifying the files.
NOTE: For PGP decryption processes, we strictly comply with the RFC4880 standard. We support syncing only base64 encoded files.
(Optional) List Strategy - Select the listing strategy you want to use:
complete_listing - The default option, where we list all the new and modified files from the bucket.
time_based_pattern_listing - You can opt to use this strategy if your files are named based on the date or time they are added to the bucket. If you add new files in lexicographic order to the bucket, in each sync, we try to identify a time-based pattern. We only list and sync the files that are lexicographically greater than the last file synced in the previous sync.
NOTE: If we are unable to identify a time-based pattern, we use the default option.
(Not applicable to Hybrid Deployment) If you want to connect using AWS PrivateLink, set the Require PrivateLink toggle to ON.
NOTE: By default, we use PrivateLink to connect if your S3 bucket and destination are in the same region. Enabling this option ensures that we always use PrivateLink to connect. If the regions are different, Fivetran won't create the connection.
(Hybrid Deployment only) If your destination is configured for Hybrid Deployment, the Hybrid Deployment Agent associated with your destination is pre-selected in the Select an existing agent drop-down menu. To use a different agent, select the agent of your choice, and then select the same agent for your destination.
Click Save & Test. Fivetran will take it from here and sync your data from your Amazon S3 bucket.
Fivetran tests and validates the Amazon S3 connection. On successful completion of the setup tests, you can sync your Amazon S3 data to your destination.
Setup tests
Fivetran performs the following Amazon S3 connection tests:
The Validating Bucket Name test validates the bucket name you specified in the setup form and checks the bucket name to ensure that it does not contain any prefix or folder path characters.
The Connecting to Bucket test validates the connection and checks the accessibility of your S3 bucket.
The Validating External ID test validates if the external ID you specified in the setup form is correctly assigned and checks whether you have configured a security role using the external ID. We perform this test only if you select IAM Role in the Access approach field.
The Validating File Pattern Regex test validates the file pattern regex you specified in the setup form. We perform this test only if you specify a regex in the File Pattern field.
The Validating Archive Pattern test validates the archive pattern regex you specified in the setup form. We perform this test only if you specify a regex in the Archive Folder Pattern field.
The Validating EscapeChar test validates the escape character you specified for your CSV files and checks the length of the character which must not be more than one. We perform this test only if you set the Non-standard character escaping? toggle to ON and specify an escape character in the Escape Character field.
The Finding Matching Files test checks if the connector can successfully retrieve a minimum of one sample file and a maximum of ten sample files based on the configuration you specified in the setup form.
The PGP Support test validates whether the connector can successfully retrieve a minimum of one sample file and a maximum of ten sample files from the S3 bucket and decrypt them using the PGP keys you uploaded. We perform this test only if you set the PGP Encryption Options toggle to ON.
The Multi-Character Delimiter Support test validates the length of the delimiter which must be within 15 characters. We perform this test only if you specify the delimiter for your CSV files in the Delimiter field.
The PrivateLink test validates whether your S3 bucket is in the same AWS Region as Fivetran. We perform this test only if you set the Require PrivateLink toggle to ON.
NOTE: The tests may take a couple of minutes to complete.
Related articles
description Connector Overview
settings API Connector Configuration