S3 Data Lakelink
Amazon Simple Storage Service (Amazon S3) provides scalable cloud storage services to build secure data lakes. Fivetran supports data lakes built on Amazon S3 as a destination.
Our S3 Data Lake destination can sync your data from multiple sources to S3 data lakes. We use AWS Glue as the data catalog for your destination tables. AWS Glue is a serverless data integration service that enables other services to quickly query and integrate the data stored in your data lake.
Setup guidelink
Follow our step-by-step S3 Data Lake setup guide to connect your S3 Data Lake destination with Fivetran.
Type transformation and mappinglink
The data types in your S3 Data Lake destination follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type |
---|---|
BOOLEAN | BOOLEAN |
SHORT | INTEGER |
INT | INTEGER |
LONG | LONG |
BIGDECIMAL | DECIMAL |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
LOCALDATE | DATE |
LOCALDATETIME | TIMESTAMP |
INSTANT | TIMESTAMPTZ |
STRING | STRING |
XML | STRING |
JSON | STRING |
BINARY | BINARY |
Supported AWS Regionslink
We can store your data in S3 buckets located in the following AWS Regions:
Name | Code |
---|---|
US East (N. Virginia) | us-east-1 |
US East (Ohio) | us-east-2 |
US West (Oregon) | us-west-2 |
Europe (Frankfurt) | eu-central-1 |
Europe (Ireland) | eu-west-1 |
Asia Pacific (Mumbai) | ap-south-1 |
Asia Pacific (Singapore) | ap-southeast-1 |
Canada (Central) | ca-central-1 |
Europe (London) | eu-west-2 |
Asia Pacific (Sydney) | ap-southeast-2 |
Asia Pacific (Tokyo) | ap-northeast-1 |
Data formatlink
Fivetran stores your data in a structured format in the destination. We write your source data to Parquet files in the Fivetran pipeline and use Iceberg tables to store these files in the data lake.
Supported query engineslink
You can use various query engines to extract data from the Iceberg tables of your destination. For example:
See Apache Iceberg's documentation for more query engines you can use.
NOTE: If you are unable to extract your data using the query engine of your choice, contact our support team.
Folder structurelink
We can sync your data to any destination folder of your choice. If you do not specify any folder, we write the data to the following directory: <root_folder>/schema name/<table_name>
Table maintenance operationslink
We perform the following maintenance operations on the Iceberg tables in your destination:
- Delete old snapshots: On weekly basis, we delete the table snapshots that are more than 7 days old. We also delete the data files that were referenced only by the deleted snapshots and are not referenced by any active snapshot.
- Delete previous versions of metadata files: In addition to the current version, we retain 3 previous versions of the metadata files and delete all the prior versions.
- Delete orphan files: Orphan files are created because of unsuccessful operations within your data pipeline. The orphan files are stored in your S3 bucket but are no longer referenced in the Iceberg table metadata. These files contribute to your S3 subscription costs. We identify these orphan files and delete them in regular intervals of 2 weeks to maintain an efficient data storage environment.
NOTE:
- We perform the maintenance operations only on Saturdays.
- You may observe a sync delay for your connectors while the destination table maintenance operations are in progress.
- To track the changes made to the Iceberg tables, we create a
sequence_number.txt
file in each table's metadata folder. You must never delete these files from your destination.
Performancelink
Typically, Fivetran-managed S3 Data Lake destinations support a rate of change of 10 MBs per second.
However, the real performance of a destination may vary depending on the workload and various other factors. Some of the common factors that impact the performance are:
- Number of data types and constraints
- Number of rows updated and the location of the updates
- Network speed
- Size of data
- Sparsity, cardinality, and width of data
Troubleshooting data issueslink
We can troubleshoot the issues in the data stored in your destination. If you want us to troubleshoot the issues, you must allow us to access your destination data. Follow our troubleshooting documentation to allow Fivetran to access the data in your destination.
Limitationslink
Fivetran does not support history mode for S3 Data Lake destinations.