Databricks Beta

The Databricks Lakehouse Platform combines the key features of data lakes and data warehouses. The platform is built on open source and open standards.

The Fivetran Databricks connector allows you to sync tables from your Databricks catalog to your destination. We sync all the schemas (databases) and tables from within your Databricks catalog.

Fivetran supports Databricks as both a database connector and a destination.

If you want to sync multiple Databricks catalogs, we recommend that you create a connection for each catalog.

Features

Feature Name	Supported	Notes
Capture deletes		All tables and fields
History mode
Custom data		All tables and fields
Data blocking		Column level, table level, and schema level
Column hashing
Re-sync		Connection level, table level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking
Authorization via API

Setup guide

Follow our step-by-step Databricks setup guide to connect your Databricks with Fivetran.

Sync overview

Once Fivetran is connected to your Databricks account, we pull a full dump of all selected data from your catalog. We then use change data feed from Databricks to pull all your new and changed data at regular intervals. If data in your source catalog changes (for example, you add new table), Fivetran automatically detects and persists these changes into your destination.

Limitations

The Databricks connector only supports tables that are stored in delta format. Learn more in Databricks' Delta Tables documentation.

Schema information

Fivetran tries to replicate the exact schema and tables from your Databricks catalog to your destination.

We append the source schema (database) with the schema name you provided in the connection setup form. We name the destination schema with the connection name appended by the schema (database) name. For example, if the connection name is databricks and the schema (database) name is schema and the table name is table, the destination table name is databricks_schema.table.

Type transformations and mapping

As we extract your data, we match Databricks data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system automatically skips columns with data types that we don't accept or transform.

The following table illustrates how we transform your Databricks data types into Fivetran-supported types:

Databricks Type	Fivetran Type	Supported
ARRAY	STRING	Yes
BIGINT	LONG	Yes
BINARY	BINARY	Yes
BOOLEAN	BOOLEAN	Yes
DATE	LOCALDATE	Yes
DOUBLE	DOUBLE	Yes
DECIMAL	BIGDECIMAL	Yes
FLOAT	BIGDECIMAL	Yes
INT	INTEGER	Yes
INTERVAL	N/A	No
MAP	JSON	Yes
SMALLINT	SHORT	Yes
STRING	STRING	Yes
STRUCT	JSON	Yes
VARCHAR	STRING	Yes
VOID	N/A	No
TIMESTAMP	INSTANT	Yes
TIMESTAMP_NTZ	INSTANT	Yes
TINYINT	SHORT	Yes

Fivetran does not support Void and Interval data types as databricks delta lakes does not support these data types.

Updating data

Fivetran performs incremental updates of any new or modified data from your source catalog. We use the change data feed of a table and use it to update the data in your warehouse

Table updates

Databricks does not enforce the primary keys on a table, so we designate the Fivetran-generated _fivetran_id column as the primary key.

We merge the following changes into the corresponding tables in your destination:

An INSERT in the source table generates a new row in the destination with _fivetran_deleted = FALSE.
A DELETE in the source table updates the corresponding row in the destination with _fivetran_deleted = TRUE.
An UPDATE in the source table updates the existing row in the destination with _fivetran_deleted = TRUE and generates a new row in the destination with _fivetran_deleted = FALSE.