Azure Data Lake Storagelink
Azure Data Lake Storage [ADLS] is a cloud-based, scalable data storage solution for big data analytics. ADLS allows you to store and manage massive amounts of data in any format. Fivetran supports data lakes built on ADLS as a destination.
Setup guidelink
Follow our step-by-step Azure Data Lake Storage setup guide to connect your Azure Data Lake Storage destination with Fivetran.
Type transformation and mappinglink
The data types in your Azure Data Lake Storage follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type |
---|---|
BOOLEAN | BOOLEAN |
SHORT | SHORT |
INT | INTEGER |
LONG | LONG |
BIGDECIMAL | DECIMAL(38, 10) |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
LOCALDATE | DATE |
INSTANT | TIMESTAMP |
STRING | STRING |
XML | STRING |
JSON | STRING |
BINARY | BINARY |
NOTE: Fivetran stores hex encoded BINARY values in your destination. You can use the unhex function,
decode(unhex(colNAme), 'UTF-8')
, in your queries to fetch the decoded BINARY values.
Supported query engineslink
To extract data from your ADLS destination, use the following query engines:
To extract data using Azure Synapse Analytics, use the following query engines:
Data formatlink
Fivetran stores your data in a structured format in the destination. We write your source data to Parquet files in the Fivetran pipeline and use Delta Lake format to store these files in the data lake.
Folder structurelink
We can sync your data to any destination folder of your choice. If you specify the prefix path, we write your data to the following directory: <root>/<prefix_path>/<schema>/<table>
. If you do not specify the prefix path, we create a folder and set the prefix path to fivetran
by default. We then write your data to the following directory: <root>/<fivetran>/<schema>/<table>
.
Unity Cataloglink
You can create external tables in Databricks Unity Catalog for your ADLS data and query your data from these external tables. For more information about integrating Unity Catalog with your ADLS destination, see our Unity Catalog Setup Guide.
NOTE: Databricks uses the table definition to understand the structure of the data. It stores the metadata of the tables in the metastore and allows us to interact with them like regular tables within Databricks by accessing the data in its original location.
Performancelink
Typically, Fivetran-managed ADLS destinations support a rate of change of 10 MBs per second.
However, the real performance of your destination may vary depending on the workload and various other factors. Some of the common factors that impact the performance are:
- Number of data types and constraints
- Number of rows updated and the location of the updates
- Network speed
- Size of data
- Sparsity, cardinality, and width of data
Limitationslink
Fivetran does not support the history mode for ADLS destination.
Fivetran creates DECIMAL columns with maximum precision and scale (38, 10).
Spark SQL pool queries cannot read the maximum values of DOUBLE and FLOAT data types.
Spark SQL pool queries truncate the TIMESTAMP values to seconds. To query any table using a TIMESTAMP column, you can use the
unixtime(unix_timestamp(<col_name>, 'yyyy-MM-dd HH:mm:ss.SSS'),'yyyy-MM-dd HH:mm:ss.ms')
clause in your queries to get the accurate values, including milliseconds and microseconds.
Unity Catalog limitationslink
To query any table using an XML column, you can use a regex.
To filter timestamp values, you can use the
string(<col_name>)
clause to get the accurate values.