S3 Data Lake Setup Guidelink
Follow our setup guide to connect your Amazon S3 data lake to Fivetran.
Prerequisiteslink
To connect your Amazon S3 data lake to Fivetran, you need the following:
- An AWS account that does not have multiple resource groups in the same AWS Region.
- An Amazon S3 bucket in one of the supported AWS Regions. For faster uploads and downloads and for optimum load times, we recommend that you create the bucket in the same Region as the data processing location of your destination. For more information about creating an Amazon S3 bucket, see AWS' documentation.
- Access to AWS Glue Data Catalog in the same Region as the S3 bucket.
NOTE: In your AWS account, you can create multiple groups within the same AWS Region. However, it's important to note that all groups in a particular AWS Region share the same AWS Glue database. Therefore, you must avoid having the same schema/table combination across multiple groups within the same Region, as it could lead to conflicts in AWS Glue database tables and potential synchronization failures.
Setup instructionslink
Find External IDlink
In the destination setup form, find the automatically-generated External ID and make a note of it. You will need it to create an IAM role for Fivetran.
NOTE: The automatically-generated External ID is tied to your account. The ID does not change even if you close and re-open the setup form. For your convenience, you can keep the browser tab open in the background while you configure your destination.
Create IAM policy for S3 bucketlink
Open your Amazon IAM console.
Go to Policies, and then click Create Policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowListBucketOfASpecificPrefix", "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}" ], "Condition": { "StringLike": { "s3:prefix": [ "{prefix_path}/*" ] } } }, { "Sid": "AllowAllObjectActionsInSpecificPrefix", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/{prefix_path}/*" ] } ] }
NOTE: Setting the
"s3:prefix":
condition to["*"]
or["{prefix_path}/*"]
grants access to all prefixes in the specified bucket or a specific prefix path in the bucket, respectively.In the policy, replace
{your-bucket-name}
with the name of your S3 bucket and{prefix_path}
with the prefix path of your S3 bucket.NOTE: If you do not specify the prefix path, the policy grants us access to the entire S3 bucket instead of limiting our access to the objects in the prefix path of the bucket.
Click Next.
In the Policy name field, enter a name for your policy, and then click Create policy
Create IAM policy for AWS Glue Data Cataloglink
In the Policies page, click Create Policy, and then go to the JSON tab.
Depending on your access requirements, copy one of the following policies and paste it in the JSON editor:
To enable the policy to access all your Glue databases and its tables, copy the following policy.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "SetupFormTest", "Effect": "Allow", "Action": [ "glue:DeleteDatabase" ], "Resource": [ "arn:aws:glue:{your-catalog-region}:{your-account-id}:database/fivetran*", "arn:aws:glue:{your-catalog-region}:{your-account-id}:catalog", "arn:aws:glue:{your-catalog-region}:{your-account-id}:table/fivetran*/*", "arn:aws:glue:{your-catalog-region}:{your-account-id}:userDefinedFunction/fivetran*/*" ] }, { "Sid": "AllConnectors", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:UpdateDatabase", "glue:CreateTable", "glue:GetTables", "glue:CreateDatabase", "glue:UpdateTable", "glue:BatchDeleteTable", "glue:DeleteTable", "glue:GetTable" ], "Resource": [ "arn:aws:glue:{your-catalog-region}:{your-account-id}:*" ] } ] }
To limit the access of the policy to specific Glue databases, copy the following policy.
NOTE: Whenever you add a new connector for your destination, you must update the policy with the new connector's details under the
Sid:AllConnectors
identifier.{ "Version": "2012-10-17", "Statement": [ { "Sid": "SetupFormTest", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:UpdateDatabase", "glue:DeleteDatabase", "glue:CreateTable", "glue:GetTables", "glue:CreateDatabase", "glue:UpdateTable", "glue:BatchDeleteTable", "glue:DeleteTable", "glue:GetTable" ], "Resource": [ "arn:aws:glue:{your-catalog-region}:{your-account-id}:database/fivetran*", "arn:aws:glue:{your-catalog-region}:{your-account-id}:catalog", "arn:aws:glue:{your-catalog-region}:{your-account-id}:table/fivetran*/*", "arn:aws:glue:{your-catalog-region}:{your-account-id}:userDefinedFunction/fivetran*/*" ] }, { "Sid": "AllConnectors", "Effect": "Allow", "Action": [ "glue:GetDatabase", "glue:UpdateDatabase", "glue:CreateTable", "glue:CreateDatabase", "glue:UpdateTable", "glue:DeleteTable", "glue:BatchDeleteTable", "glue:GetTable", "glue:GetTables" ], "Resource": [ "arn:aws:glue:{your-catalog-region}:{your-account-id}:database/{schema_name}", "arn:aws:glue:{your-catalog-region}:{your-account-id}:catalog", "arn:aws:glue:{your-catalog-region}:{your-account-id}:table/{schema_name}/*" ] } ] }
NOTE: We need the
DeleteDatabase
permission only to perform the setup tests.In the policy, replace
{your-catalog-region}
with the Region of your S3 bucket and{your-account-id}
with your AWS account ID.If you copied the policy with limited access to specific databases, replace
{schema_name}
with your connector's schema name.Click Next.
In the Policy name field, enter a name for your policy, and then click Create policy.
Create IAM rolelink
Go to Roles, and then click Create role.
Select AWS account, and then select Another AWS account.
In the Account ID field, enter Fivetran's account ID,
834469178297
.Select the Require external ID checkbox, and then enter the External ID you found in Step 1.
Click Next.
Click Next.
In the Role name field, enter a name for the role, and then click Create role.
In the Roles page, select the role you created.
Make a note of the ARN. You will need it to configure Fivetran.
(Optional) Configure AWS PrivateLinklink
IMPORTANT: You must have a Business Critical plan to use AWS PrivateLink.
AWS PrivateLink allows VPCs and AWS-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. PrivateLink is the most secure connection method. Learn more in AWS’ PrivateLink documentation.
Follow our AWS PrivateLink setup guide to configure PrivateLink for your S3 bucket.
Complete Fivetran configuration link
- Log in to your Fivetran account.
- Go to the Destinations page and click Add destination.
- Enter a Destination name of your choice and then click Add.
- Select S3 Data Lake as the destination type.
- In the destination setup form, enter your S3 Bucket name.
- In the Fivetran Role ARN field, enter the ARN you found in Step 4.
- (Optional) Enter the S3 Prefix Path of your bucket.
NOTE: The prefix path must not start or end with a forward slash (/).
- Enter your S3 Bucket Region.
- To always connect using AWS PrivateLink, set the Require PrivateLink toggle to ON.
NOTE: By default, we use PrivateLink to connect if your S3 bucket and Fivetran are in the same AWS Region. Enabling this option ensures that we always use PrivateLink to connect. If you set this toggle to OFF and if your S3 bucket and Fivetran are not in the same AWS region, Fivetran does not use a PrivateLink connection and skips the PrivateLink setup test.
- Choose your Data processing location.
- Choose your Cloud service provider and its region as described in our Destinations documentation.
NOTE:
- For faster uploads and downloads and for optimum load times, we recommend that you choose AWS as the cloud service provider and the AWS Region in which your S3 bucket is located as the data processing location.
- For S3 data lake destinations, AWS is supported in all pricing plans. For information about the supported AWS Regions, see our destination overview documentation.
- Choose your Time zone.
- (Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
- Click Save & Test.
Fivetran tests and validates the S3 Data Lake connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the S3 Data Lake destination.
In addition, Fivetran automatically configures a Fivetran Platform Connector to transfer the connector logs and account metadata to a schema in this destination. The Fivetran Platform Connector enables you to monitor your connectors, track your usage, and audit changes. The connector sends all these details at the destination level.
IMPORTANT: If you are an Account Administrator, you can manually add the Fivetran Platform Connector on an account level so that it syncs all the metadata and logs for all the destinations in your account to a single destination. If an account-level Fivetran Platform Connector is already configured in a destination in your Fivetran account, then we don't add destination-level Fivetran Platform Connectors to the new destinations you create.
Setup testslink
Fivetran performs the following S3 Data Lake connection tests:
The S3 Read and Write Access test checks the accessibility of your S3 bucket and validates the resources you provided in the IAM policy.
The Glue Access test checks the accessibility of AWS Glue Data Catalog and validates the resources you provided in the IAM policy.
The PrivateLink test checks whether your S3 bucket is in the same AWS Region as Fivetran. We perform this test only if you set the Require PrivateLink toggle to ON.
NOTE: The tests may take a couple of minutes to complete.
Related articleslink
description Destination Overview
settings API Destination Configuration