Managed Data Lake

Our Managed Data Lake is the single destination for all data lake integrations, consolidating support for Amazon S3, Azure Data Lake Storage (ADLS), and OneLake. This unified approach simplifies data management, ensuring consistent ingestion, transformation, and query capabilities across all supported storage providers.

Supported storage providers

Fivetran’s Unified Data Lake supports the following storage providers:

Amazon S3
Azure Data Lake Storage (ADLS)
OneLake
Google Cloud Storage (GCS) (Planned Support)

Setup guide

Follow our step-by-step Managed Data Lake setup guide to connect your Azure Data Lake Storage destination with Fivetran.

Type transformation and mapping

This table outlines how Fivetran's standard data types are converted into the corresponding destination data types for each storage provider.

Fivetran Data Type	Amazon S3 Data Type	ADLS Data Type (Delta Lake)	ADLS Data Type (Iceberg)	OneLake Data Type
BOOLEAN	BOOLEAN	BOOLEAN	BOOLEAN	BOOLEAN
SHORT	SHORT	SHORT	INTEGER	SHORT
INT	INTEGER	INTEGER	INTEGER	INTEGER
LONG	LONG	LONG	LONG	LONG
BIGDECIMAL	DECIMAL (38, 10)	DECIMAL (38, 10)	DECIMAL (38, 10)	DECIMAL (38, 10)
FLOAT	FLOAT	FLOAT	FLOAT	FLOAT
DOUBLE	DOUBLE	DOUBLE	DOUBLE	DOUBLE
LOCALDATE	DATE	DATE	DATE	DATE
INSTANT	TIMESTAMP	TIMESTAMP	TIMESTAMPTZ	TIMESTAMP
STRING	STRING	STRING	STRING	STRING
XML	STRING	STRING	STRING	STRING
JSON	STRING	STRING	STRING	STRING
BINARY	BINARY	BINARY	BINARY	BINARY

Supported query engines

This table provides a comprehensive view of the query engines compatible with each data lake destination.

Query Engine	Amazon S3 Data Lake	ADLS Data Lake	OneLake Data Lake
Amazon Athena	✓
Azure Databricks		✓	✓
Azure Synapse Analytics		✓	✓
Dremio	✓	✓
Redshift	✓
Snowflake	✓
Starburst Galaxy	✓

NOTE
Make sure Unity Catalog is not integrated with your Databricks workspace.
We support Starburst Galaxy only for the Iceberg tables. Also, to use Starburst Galaxy as your query engine, you must integrate it using AWS Glue metastore. If you are unable to extract your data using the query engine of your choice, contact our support team

Data formats

Fivetran stores your source data in a structured format within the destination. The data is written to Parquet files during the Fivetran pipeline process and subsequently stored in specific tables in your data lake.

Supported table formats

Depending on your chosen storage provider, Fivetran supports the following table formats:

Amazon S3 Data Lake:
- Delta Lake (Beta)
- Apache Iceberg
Azure Data Lake Storage (ADLS):
- Delta Lake
- Apache Iceberg (Beta)
OneLake:
- Delta Lake

NOTE: During the destination setup, you can select the table format you prefer for your data lake.

Folder structure

Fivetran organizes your data into destination folders based on your configuration:

Amazon S3 Data Lake:
- Default directory: <root_folder>/schema_name/<table_name>
Azure Data Lake Storage (ADLS):
- If a prefix path is specified: <root>/<prefix_path>/<schema>/<table>
- If no prefix path is specified, the default is: <root>/fivetran/<schema>/<table>
OneLake:
- Directory structure: <lakehouse_name>.lakehouse/Tables/<table_name> or <lakehouse_guid>/Tables/<table_name>

This structured approach ensures organized storage and efficient data retrieval across all supported data lake destinations.

Table maintenance operations

To maintain an efficient and optimized data storage environment, Fivetran performs regular maintenance operations on your destination tables. These operations vary based on the table format and are designed to manage storage consumption and enhance query performance.

Maintenance Operations for Delta Lake tables

Fivetran performs the following maintenance tasks on Delta Lake tables across all supported destinations:

Delete old snapshots and removed files: We remove table snapshots older than the Snapshot Retention Period specified during your destination setup. However, we always retain the last four checkpoints of a table before deleting its snapshots. Additionally, we delete removed files—those not referenced in the latest table snapshots but present in older snapshots—to optimize storage costs.
Delete orphan files: Orphan files result from unsuccessful operations within your data pipeline and are no longer referenced in the Delta Lake table metadata. These files, if left unmanaged, contribute to unnecessary storage costs. Fivetran identifies orphan files older than seven days and deletes them at regular intervals of two weeks.

NOTE: You may observe a sync delay for your connectors while table maintenance operations are in progress. To minimize disruptions, we schedule these operations only on Saturdays.

Maintenance operations for Iceberg tables

For Iceberg tables, Fivetran performs the following maintenance tasks:

Expire snapshots: We remove snapshots and associated metadata files that are older than the configured retention threshold to free up storage space.
Remove orphan files: Similar to Delta Lake tables, we identify and delete orphan files not referenced by any snapshots to maintain storage efficiency.

NOTE: The specific maintenance operations for Iceberg tables may vary based on the destination and configuration. Please refer to the destination-specific documentation for detailed information.

By performing these maintenance operations, Fivetran ensures that your data lake remains optimized for both storage and performance, providing a seamless and efficient data experience.

Column statistics

Fivetran updates column-level statistics to enhance query performance and optimize storage in your data lake destination. The specific statistics maintained depend on the table format and the number of columns in your tables.

Delta lake tables

For tables using the Delta Lake format:

Primary key columns: Fivetran updates the following statistics:
- Minimum Value
- Maximum Value
NOTE: Statistics are maintained only for primary key columns in Delta Lake tables.

Iceberg tables

For tables using the Iceberg format, the maintenance of column statistics is determined by the number of columns:

Tables with 200 or Fewer Columns:
- Fivetran updates the minimum and maximum values for all columns.
Tables with More Than 200 Columns:
- Fivetran updates the minimum and maximum values only for primary key columns.
Note: This approach ensures efficient storage and performance optimization for large tables.

Implementation details

For Connectors Set Up On or After June 20, 2024:
- The above policies are applied based on the number of columns in the table.
For Connectors Set Up Before June 20, 2024:
- Fivetran updates statistics only for new Parquet files synced into your destination after June 20, 2024.
- Statistics for data synced before this date remain unchanged.

Reserved column names

Certain column names are reserved in specific table formats and may cause conflicts during data ingestion. To prevent these conflicts, Fivetran modifies such column names before writing them to your destination tables.

Iceberg table format

The Iceberg table format reserves the following column names:

_deleted
_file
_partition
_pos
_spec_id
file_path
pos
row

To avoid naming conflicts, Fivetran prefixes these reserved column names with a hash symbol (#) before writing them to the Iceberg tables in your destination. :contentReference[oaicite:0]{index=0}

Delta Lake table format

The Delta Lake table format does not have specific reserved column names that require modification. Therefore, Fivetran does not alter column names when writing to Delta Lake tables.

OneLake

In OneLake destinations, Fivetran uses the Delta Lake table format. As mentioned above, there are no specific reserved column names in Delta Lake that necessitate modification.

By handling reserved column names appropriately, Fivetran ensures seamless data ingestion and prevents potential conflicts in your data lake destination.

Limitations

When using Fivetran's Unified Data Lake as your destination, please be aware of the following limitations:

General limitations

Decimal Data Type Precision: Fivetran creates DECIMAL columns with a maximum precision and scale of (38, 10).
Table and Schema Naming:
- Names must not start or end with an underscore (_).
- Names must not contain multiple consecutive underscores (__).
Change Data Feed: Fivetran does not support the Change Data Feed feature for Delta Lake tables. Do not enable this feature for tables that Fivetran creates in your destination.
Query Engine Compatibility (only for OneLake): Ensure that Unity Catalog is not integrated with your Databricks workspace when querying data from your OneLake destination.

Data type limitations

Floating-Point Precision:
- Spark SQL and SparkR queries cannot read the maximum values of DOUBLE and FLOAT data types.
- SparkR queries cannot read the minimum and maximum values of the LONG data type.
Timestamp Precision: Spark SQL and SparkR queries truncate timestamp values to seconds. To retrieve data with millisecond or microsecond precision, use the following clause in your queries:
```
unixtime(unix_timestamp(<col_name>, 'yyyy-MM-dd HH:mm:ss.SSS'), 'yyyy-MM-dd HH:mm:ss.SSS')
```