Are Tables Created by Fivetran Connectors Managed or Unmanaged?
Question
In Databricks, are the tables that are created by the Fivetran connectors present in Databricks or are stored as Parquest files in my Azure storage?
Environment
Answer
During the destination setup, you can opt to:
- Use the default Databricks File System location registered with the cluster. The tables are managed by Databricks.
- Create Delta tables as external tables. Set the Create Delta tables in an external location toggle to ON and enter the External Location you want to use. The tables are stored in your external Azure storage.
Limitation with Databricks on AWS destinations
When you configure Databricks to use AWS Glue as the external metastore and create Delta tables as external tables, AWS Glue may store an incorrect location URI in the table metadata. This behavior is a known limitation of the AWS Glue and Databricks integration. For more information about the limitation, see Databricks documentation.
When you enable external Delta tables with AWS Glue as the metastore, Fivetran writes the Delta files to the correct external S3 location. However, AWS Glue stores an incorrect location URI value in the table’s metadata. Databricks does not rely on this value to locate the data. Instead, it reads the correct S3 path from the table’s SerDe (Serializer/Deserializer) parameters, which allows queries in Databricks to continue working as expected.
This limitation does not cause any data loss or corruption, and queries in Databricks continue to function normally. However, external tools that rely only on the AWS Glue location URI and do not read the SerDe parameters may display an incorrect S3 path.
To avoid this limitation and improve data governance, Databricks recommends upgrading to Unity Catalog instead of using AWS Glue as an external metastore.