Set Up AWS Data Lake
This tutorial explains how to configure an Amazon S3 data lake as your destination using the Fivetran Managed Data Lake Service. It walks through the required AWS setup, Fivetran configuration, and how to validate data ingestion.
Watch the video
What you will learn
This tutorial covers the following steps:
Configure AWS IAM permissions
- Create IAM policies that grant Fivetran access to your S3 bucket
- Update the policy with your bucket name and prefix path
- Apply least-privilege access principles
Create an IAM role for Fivetran
- Create a role with a trust relationship for the Fivetran account
- Configure the external ID required for secure access
- Attach the appropriate IAM policies
Create and configure an S3 bucket
- Create a dedicated bucket for your data lake
- Define the storage structure using prefixes
(Optional) Enable AWS Glue integration
- Configure AWS Glue to catalog your data
- Enable downstream services such as Athena or Redshift to query the data
Configure the destination in Fivetran
- Select Amazon S3 as the destination
- Provide the IAM role ARN and bucket details
- Test the connection
Ingest and validate data
- Add a connector to start data ingestion
- Verify that Fivetran writes data to the expected S3 path
- Confirm schema and file structure
Query data using AWS services
- Use AWS Glue to catalog tables
- Query data using Amazon Athena
- Understand how data is stored using Apache Iceberg tables
Summary
After completing this tutorial, you will have a working Amazon S3 data lake integrated with Fivetran.