Databricks Setup Guide Beta
Follow our setup guide to replicate your Databricks catalog to your destination using Fivetran.
Prerequisites
To connect Databricks to Fivetran, you need the following:
- A Databricks account.
- At least one SQL warehouse or a compute cluster to sync data from your catalog.
Setup instructions
Create personal access token
IMPORTANT: If you already have a personal access token, skip to the Finish Fivetran configuration step.
Fivetran uses a secure token to connect to Databricks. Follow Databricks' token management guide.
NOTE: If we find a table which doesn't have ChangeDataFeed enabled, we try to activate it. Make sure the personal access token has MODIFY permissions on the table. The command to enable ChangeDataFeed for a table is
ALTER TABLE catalog_name.schema_name.table_name SET TBLPROPERTIES (delta.enableChangeDataFeed=true)
.
Connect SQL warehouse
IMPORTANT:
- If you want to connect to an existing SQL warehouse, skip to the Finish Fivetran configuration step.
- If you want to use a compute cluster, skip to the Connect all-purpose compute step.
In the Databricks console, go to SQL > SQL warehouses > Create SQL warehouse.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Connect all-purpose compute
Go to your compute cluster or warehouse.
Click Configuration > Advanced Options > JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Finish Fivetran configuration
- In the connector setup form, enter your chosen destination schema name.
- Enter the server hostname of your Databricks cluster that you noted in the Connect all-purpose compute step or Connect SQL warehouse.
- Enter the Port number that you noted in the Connect all-purpose compute step or Connect SQL warehouse. The default value is
443
. - Enter the HTTP path of the SQL warehouse that you noted in the Connect all-purpose compute step or Connect SQL warehouse.
- Enter your personal access token.
- (Optional) Enter the catalog you want to sync data from. If you leave this field empty, we use the default
hive_metastore
catalog. - Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using your new Databricks connector.
Setup tests
Fivetran performs the following Databricks connection tests:
- The Databricks Connection test checks the accessibility of the Databricks project and validates the database credentials you provided in the setup form.
- The Permission test checks that we can connect to the database and get the details of tables and columns.
NOTE: The tests may take a couple of minutes to finish running.