Set Up Amazon S3 Data Lake

This tutorial explains how to configure an Amazon S3 data lake as your destination using the Fivetran Managed Data Lake Service. It walks through the required AWS setup, Fivetran configuration, and how to validate data ingestion.

Watch the video

What you will learn

This tutorial covers the following steps:

Configure AWS IAM permissions

Create IAM policies that grant Fivetran access to your S3 bucket
Update the policy with your bucket name and prefix path
Apply least-privilege access principles

Create an IAM role for Fivetran

Create a role with a trust relationship for the Fivetran account
Configure the external ID required for secure access
Attach the appropriate IAM policies

Create and configure an S3 bucket

Create a dedicated bucket for your data lake
Define the storage structure using prefixes

(Optional) Enable AWS Glue integration

Configure AWS Glue to catalog your data
Enable downstream services such as Athena or Redshift to query the data

Configure the destination in Fivetran

Select Amazon S3 as the destination
Provide the IAM role ARN and bucket details
Test the connection

Ingest and validate data

Add a connector to start data ingestion
Verify that Fivetran writes data to the expected S3 path
Confirm schema and file structure

Query data using AWS services

Use AWS Glue to catalog tables
Query data using Amazon Athena
Understand how data is stored using Apache Iceberg tables

Summary

After completing this tutorial, you will have a working Amazon S3 data lake integrated with Fivetran.