Metadata Catalogs
Managed Data Lake Service lets you access your data lake tables through multiple catalog options. By default, the service includes a pre-configured Fivetran Iceberg REST Catalog that makes managing metadata and querying your Iceberg tables simple and seamless. In addition to the default catalog, you can opt to integrate the following catalogs with your data lake:
- AWS Glue for Iceberg tables in AWS data lakes
- BigLake metastore for Iceberg tables in GCS data lakes
- Databricks Unity Catalog for Delta Lake tables in AWS, Azure, and GCS data lakes
These additional catalog options provide Fivetran-managed Delta Lake and Iceberg tables through familiar metastores that may already exist in your environment.
Fivetran manages all tables in your data lake. Do not modify these tables manually or by using any external catalog, as doing so may lead to data integrity issues.
Fivetran Iceberg REST Catalog
Fivetran Iceberg REST Catalog serves as the default catalog for all Iceberg tables in all data lakes. Each Managed Data Lake Service destination you set up has its own dedicated Fivetran Iceberg REST Catalog, which we configure based on your destination details. We leverage Apache Polaris to implement the Fivetran Iceberg REST Catalog. The catalog is read-only for query engines, and we update it during every sync to reflect the most accurate and up-to-date metadata. If an external catalog becomes inconsistent, Fivetran detects the issue during the next sync and resolves it by publishing the latest version of the tables from the Fivetran Iceberg REST Catalog. At all times, Fivetran Iceberg REST catalog serves as the most definitive source of metadata.
We use the Iceberg OpenAPI Specification to build the catalog, ensuring compatibility with any Iceberg REST Catalog–compliant query engine for reading data from your data lake. For more information about integrating query engines with Fivetran Iceberg REST Catalog, see our Integration Guide.
The following diagram outlines the high-level architecture of Managed Data Lake Service, including the role of the Fivetran Iceberg REST Catalog:
AWS Glue
Managed Data Lake Service allows you to integrate AWS Glue with your AWS data lake, enabling seamless access to the AWS ecosystem, including EMR, Athena, and Redshift. This is in addition to the pre-configured Fivetran Iceberg REST Catalog we provide for all AWS data lakes. To integrate this catalog with your data lake, you must create an IAM policy for AWS Glue Data Catalog. For more information about integrating the catalog, see our Setup Guide.
BigLake metastore
With Managed Data Lake Service, you can integrate BigLake metastore with your GCS data lake and then query your data using BigQuery. To integrate BigLake metastore with your data lake, see our Setup Guide.
Unity Catalog
Managed Data Lake Service enables you to integrate Databricks Unity Catalog with your AWS, Azure, and GCS data lakes, providing a simple and efficient way to organize, manage, and work with Delta Lake tables from the Databricks ecosystem.
To integrate Unity Catalog with your data lake, you must create an external location in Databricks and specify the storage credentials required to access the data lake. The storage location in Databricks must match the data lake storage used for Delta Lake tables. Once the external location is set up, you can query your data directly from these external tables. For more information about integrating the catalog with your data lake, see our Setup Guide.
When filtering timestamp values from Unity Catalog, make sure you use the string(<col_name>)
clause to get the accurate values.