OneLake
OneLake is Microsoft Fabric's unified and logical data lake.
Setup guide
Follow our step-by-step OneLake setup guide to connect your OneLake destination with Fivetran.
Type transformation and mapping
The data types in your OneLake destination follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type |
---|---|
BOOLEAN | BOOLEAN |
SHORT | SHORT |
INT | INTEGER |
LONG | LONG |
BIGDECIMAL | DECIMAL(38, 10) |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
LOCALDATE | DATE |
INSTANT | TIMESTAMP |
STRING | STRING |
XML | STRING |
JSON | STRING |
BINARY | BINARY |
Supported query engines
You can use the following query engines to query your data from your OneLake destination:
- Azure Synapse Analytics (native application of Microsoft Fabric)
- Databricks
NOTE: Make sure Unity Catalog is not integrated with your Databricks workspace.
Data format
Fivetran stores your data in a structured format in the destination. We write your source data to Parquet files in the Fivetran pipeline and use Delta Lake format to store these files in the data lake.
Folder structure
We write your data to the following directory: <lakehouse_name>.lakehouse/Tables/<table_name>
or <lakehouse_guid>/Tables/<table_name>
Table maintenance operations
We perform the following maintenance operations on the Delta Lake tables in your destination:
- Delete old snapshots: We delete the table snapshots that are older than the Snapshot Retention Period you specify in the destination setup form. However, we always retain the last 4 checkpoints of a table before deleting its snapshots.
- Delete orphan and removed files: Orphan files are created because of unsuccessful operations within your data pipeline. The orphan files are stored in your destination but are no longer referenced in the Delta Lake table metadata. Removed files are the files that are not referenced in the latest table snapshots but were referenced in the older snapshots. These orphan and removed files contribute to your OneLake subscription costs. We identify such files that are older than 7 days and delete them in regular intervals of 2 weeks to maintain an efficient data storage environment.
NOTE: You may observe a sync delay for your connectors while the table maintenance operations are in progress. To ensure a seamless experience with minimal sync delays, we perform the table maintenance operations only on Saturdays.
Limitations
Fivetran creates the DECIMAL columns with maximum precision and scale (38, 10).
Spark SQL and SparkR queries cannot read the maximum values of DOUBLE and FLOAT data types.
SparkR queries cannot read the minimum and maximum values of LONG data type.
Spark SQL and SparkR query truncate the timestamp values to seconds. To query any table using a TIMESTAMP column, you can use
unixtime(unix_timestamp(<col_name>, 'yyyy-MM-dd HH:mm:ss.SSS'),'yyyy-MM-dd HH:mm:ss.ms')
clause in your queries to get the accurate data, including milliseconds or microseconds. You can also use PySpark or Spark Scala to get accurate values.The table and schema names must not start or end with an underscore, and must not contain multiple consecutive underscores ( __ ).
Fivetran does not support the Change Data Feed feature for Delta Lake tables. You must not enable Change Data Feed for the Delta Lake tables that Fivetran creates in your OneLake destination.