Databricks Setup Guidelink
Follow our setup guide to connect Databricks to Fivetran.
IMPORTANT: If you used Databricks Partner Connect to set up your Fivetran account, you don't need to follow the setup guide instructions because you already have a connection to Databricks. We strongly recommend using Databricks Partner Connect to set up your destination. Learn how in Databricks' Fivetran Partner Connect documentation. To connect your Databricks workspace to Fivetran using Partner Connect, make sure you meet Databricks' requirements.
Supported cloud platformslink
You can set up your Databricks destination on the following cloud platforms:
Prerequisiteslink
To connect Databricks to Fivetran, you need the following:
- a Databricks account
- a Fivetran account with permission to add destinations
- Unity Catalog enabled on your Databricks Workspace. Unity Catalog is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards in your lakehouse on any cloud. We strongly recommend using Fivetran with Unity Catalog as it simplifies access control and sharing of tables created by Fivetran. Legacy deployments can continue to use Databricks without Unity Catalog.
- SQL warehouses. SQL warehouses are optimized for data ingestion and analytics workloads, start and shut down rapidly and are automatically upgraded with the latest enhancements by Databricks. Legacy deployments can continue to use Databricks clusters with Databricks Runtime v7.0+.
Databricks on AWS - Setup instructionslink
Learn how to set up your Databricks on AWS destination.
Expand for instructions
Choose a catalog link
If you don't use Unity Catalog, skip ahead to the Connect SQL warehouse step. Fivetran will create schemas in the default catalog,
hive_metastore
.
If you use Unity Catalog, you need to decide which catalog to use with Fivetran. For example, you could create a catalog called fivetran
and organize tables from different connectors in it in separate schemas, like fivetran.salesforce
or fivetran.mixpanel
. If you need to set up Unity Catalog, follow Databricks' Get started using Unity Catalog guide.
Log in to your Databricks workspace.
Click Data in the Databricks console.
Choose a catalog in the Data Explorer.
Connect SQL warehouse link
In the Databricks console, go to SQL > SQL warehouses > Create SQL warehouse. If you want to select an existing SQL warehouse, skip to step 5 in this section.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Configure authentication typelink
Fivetran supports the following authentication types to connect to Databricks:
Databricks personal access token authentication: Fivetran supports this authentication type for:
- destinations that are connected to Fivetran using AWS PrivateLink or Azure Private Link
- destinations that were set up before April 24, 2024 and are not connected to Fivetran using AWS PrivateLink or Azure Private Link
OAuth machine-to-machine (M2M) authentication: Fivetran supports this authentication type for all destinations that are not connected to Fivetran using AWS PrivateLink or Azure Private Link.
By default, destinations set up before April 24, 2024 use Databricks personal access token authentication. However, if such a destination is not connected through AWS PrivateLink or Azure Private Link, you can change the authentication type to OAuth machine-to-machine (M2M) authentication.
NOTE: You cannot revert the change once you change the authentication type for a destination set up before April 24, 2024.
Configure Databricks personal access token authenticationlink
To use the Databricks personal access token authentication type, create a personal access token by following the instructions in Databricks' personal access token authentication documentation.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the schema:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Configure OAuth machine-to-machine (M2M) authenticationlink
To use the OAuth machine-to-machine (M2M) authentication type, create your OAuth Client ID and Secret by following the instructions in Databricks' OAuth machine-to-machine (M2M) authentication documentation.
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Fivetran supports the following external storages:
- Amazon S3 bucket (recommended)
- Azure Blob storage container
Configure Amazon S3 bucketlink
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
Configure Azure Blob storage containerlink
Create Azure storage accountlink
Create an Azure storage account by following the instructions in Azure Blob Storage's documentation. While creating the account, make sure you do the following:
In the Advanced tab, select the Require secure transfer for REST API operations and Enable storage account key access checkboxes.
In the Permitted scope for copy operations drop-down menu, select From any storage account.
In the Networking tab, select one of the following Network access options:
- If your Databricks destination is not hosted on Azure or if your storage container and destination are in different regions, select Enable public access from all networks.
- If your Databricks destination is hosted on Azure and if it is in the same region as your storage container, select Enable public access from selected virtual networks and IP addresses.
IMPORTANT: Ensure the virtual network or subnet where your Databricks workspace or cluster resides is included in the allowed list for public access on the Azure storage account.
In the Encryption tab, choose Microsoft-managed keys (MMK) as the Encryption type.
Find storage account name and access keylink
Log in to the Azure portal.
Go to your storage account.
On the navigation menu, click Access keys under Security + networking.
Make a note of the Storage account name and Key. You will need them to configure Fivetran.
IMPORTANT: As a security best practice, do not save your access key and account name anywhere in plain text that is accessible to others.
(Optional) Connect using AWS PrivateLink Beta link
IMPORTANT: Do not perform this step if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use AWS PrivateLink.
You can connect Fivetran to your Databricks destination using AWS PrivateLink. AWS PrivateLink allows VPCs and AWS-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. PrivateLink is the most secure connection method. Learn more in AWS PrivateLink's documentation.
How it works:
Fivetran accesses the data plane in your AWS account using the control plane network in Databricks' account.
You set up a back-end AWS PrivateLink connection between your AWS account and Databricks' AWS account (shown as
(Workspace 1/2) Link-1
in the diagram above).Fivetran creates and maintains a front-end AWS PrivateLink connection between Fivetran's AWS account and Databricks' AWS account (shown as
Regional - Link-2
in the diagram above).
Prerequisiteslink
To set up AWS PrivateLink, you need:
- A Fivetran instance configured to run in AWS
- A Databricks destination in one of our supported regions
- All of Databricks' requirements
Configure AWS PrivateLinklink
Follow Databricks' Enable AWS PrivateLink documentation to enable private connectivity for your workspaces. Your workspaces must have the following:
- A registered back-end VPC endpoint for secure cluster connectivity relay
- A registered back-end VPC endpoint for REST APIs
- A PAS object with access to Fivetran's VPC endpoints
- If the Private Access Level on the PAS object is set to Account, a Fivetran VPC endpoint (for the applicable AWS region) that's registered once per account
- If the Private Access Level on the PAS object is set to Endpoint, a Fivetran VPC endpoint (for applicable AWS region) that's registered using the
allowed_vpc_endpoint_ids
property
Register Fivetran endpoint detailslink
Register the Fivetran endpoint for the applicable AWS region with your Databricks workspaces. We cannot access your workspaces until you do so.
AWS Region | VPC endpoint |
---|---|
ap-south-1 Asia Pacific (Mumbai) | vpce-089f13c9231c2b729 |
ap-southeast-2 Asia Pacific (Sydney) | vpce-0e5f79a1613d0cf05 |
ca-central-1 Canada (Central) | vpce-09f0049f9a92177f1 |
eu-central-1 Europe (Frankfurt) | vpce-049699737170c880d |
eu-west-1 Europe (Ireland) | vpce-0b32cb6c08f6fe0df |
eu-west-2 Europe (London) | vpce-03fde3e4804f537eb |
us-east-1 US East (N. Virginia) | vpce-0ff9bd04153060180 |
us-east-2 US East (Ohio) | vpce-05153aa99bf7a4575 |
us-west-2 US West (Oregon) | vpce-0884ff0f23dcbf0dc |
IMPORTANT: Regardless of your Fivetran subscription plan, if you have enabled back-end AWS PrivateLink connection between your AWS account and Databricks' AWS account (shown as
(Workspace 1/2) Link-1
in the diagram above), you must register the Fivetran endpoint (for the applicable AWS region) to avoid connection failures.
Complete Fivetran configuration link
Log in to your Fivetran account.
Go to the Destinations page and click Add destination.
Enter a Destination name of your choice and then click Add.
Select Databricks as the destination type.
(Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 4:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
(Unity Catalog only) Enter the Catalog name. This is the name of the catalog in Unity Catalog, such as
fivetran
ormain
.Enter the Server Hostname.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
Enter the Port number.
Enter the HTTP Path.
Specify the Authentication Type for your destination.
- If you are setting up a new destination on or after April 24, 2024, enter the OAuth 2.0 Client ID and OAuth 2.0 Secret you created in Step 3.
- If you are editing the connection details for an existing destination that was set up before April 24, 2024, select the Authentication Type of your choice. If you selected PERSONAL ACCESS TOKEN, enter the PERSONAL ACCESS TOKEN you created in Step 3. If you selected OAUTH2, enter the OAuth2 Client ID and OAuth2 Secret you created in Step 3.
IMPORTANT: We recommend that you select OAUTH 2.0 in this drop-down menu. However, if you set OAUTH 2.0 as the authentication type for your destination, you cannot revert it later.
(Optional) Choose the Databricks Deployment Cloud based on the cloud platform you are using with Databricks.
(Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can:
Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path.Use the default Databricks File System location registered with the cluster. Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path.
Choose your Connection Method:
- Connect directly
- Connect via SSH
- Connect via PrivateLink
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via PrivateLink option is only available for Business Critical accounts.
Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and AWS region as described in our Destinations documentation.
Choose your Time zone.
(Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
In addition, Fivetran automatically configures a Fivetran Platform Connector to transfer the connector logs and account metadata to a schema in this destination. The Fivetran Platform Connector enables you to monitor your connectors, track your usage, and audit changes. The connector sends all these details at the destination level.
IMPORTANT: If you are an Account Administrator, you can manually add the Fivetran Platform Connector on an account level so that it syncs all the metadata and logs for all the destinations in your account to a single destination. If an account-level Fivetran Platform Connector is already configured in a destination in your Fivetran account, then we don't add destination-level Fivetran Platform Connectors to the new destinations you create.
(Optional) Storing data in external locations link
With Unity Cataloglink
You can customize where Fivetran stores Delta tables. If you use Unity Catalog, Fivetran creates managed tables in Databricks. Managed tables are stored in the root storage location that you configure when you create a metastore.
You can also instruct Fivetran to store tables in an external location managed by Unity Catalog. To do so, follow these steps:
- Follow Databricks' Manage storage credential guide to add a storage credential to Unity Catalog.
- Follow Databricks' Manage external location guide to add an external location to Unity Catalog.
- Enable Create Delta tables in an external location and specify the external location (for example,
s3://mybucket/myprefix
) as the value.
Without Unity Cataloglink
If you do not use Unity Catalog, you can still store Delta tables in a specific S3 bucket. First, you must create an AWS instance profile and associate it with Databricks compute.
See Databricks' Secure access to S3 buckets using instance profiles documentation. Perform the first four steps mentioned in the guide to create an instance profile.
In the Databricks console, click Settings > SQL Admin Console.
In the Settings window, select SQL Warehouse Settings.
Select the Instance profile you created above.
(Optional) Connect Databricks cluster link
TIP: We strongly recommend using Databricks SQL warehouses with Fivetran. To learn more, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Engineering > Cluster > Create Cluster..
Enter a Cluster name of your choice.
Set the Databricks Runtime Version to the latest LTS release. At minimum, you must choose v7.3+.
Select the Cluster mode.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
(Optional) If you are using an external data storage and have disabled the Unity Catalog feature, select the Instance profile you created in the Create instance profile step.
In the Advanced Options section, select Spark.
If you have set the Databricks Runtime Version to below 9.1, paste the following code in Spark config field:
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl.disable.cache true spark.hadoop.fs.s3.impl.disable.cache true spark.hadoop.fs.s3a.impl.disable.cache true spark.hadoop.fs.s3.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Add external location and storage credentials step.
Databricks on Azure - Setup instructionslink
Learn how to set up your Databricks on Azure destination.
Expand for instructions
(Optional) Configure Unity Catalog link
IMPORTANT: Perform the following steps if you want to use the Unity Catalog feature or an external data storage.
Create a storage accountlink
To create an Azure Blob Storage account, follow Microsoft's Create a storage account guide.
To create an Azure Data Lake Storage Gen2 storage account, follow Microsoft's Create a storage account to use with Azure Data Lake Storage Gen2 guide.
Create a containerlink
To create a container in Azure Blob storage, follow Microsoft's Create a container guide.
To create a container in ADLS Gen2 storage, follow Microsoft's Create a container guide.
Configure a managed identity for Unity Cataloglink
IMPORTANT: Perform this step if you have enabled the Unity Catalog feature and want to access the metastore using a managed identity.
You can configure Unity Catalog (Preview) to use an Azure managed identity to access storage containers.
To configure a managed identity for Unity Catalog, follow Databricks' Configure a managed identity for Unity Catalog guide.
Create a metastore and attach workspacelink
IMPORTANT: Perform this step if you have enabled the Unity Catalog feature.
To create a metastore and attach workspace, follow Microsoft's Create the metastore guide.
(Optional) Connect using Azure Private Link Beta link
IMPORTANT: Do not perform this step if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use Azure Private Link.
You can connect Fivetran to your Databricks destination using Azure Private Link. Azure Private Link allows VNet and Azure-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. Learn more in Microsoft's Azure Private Link documentation.
Prerequisiteslink
To set up Azure Private Link, you need:
- A Fivetran instance configured to run in Azure
- A Databricks destination in one of our supported regions
- An Azure Databricks workspace that is in your own virtual network (Vnet injection). Learn more in Azure's Create an Azure Databricks workspace in your own Virtual Network documentation.
IMPORTANT: Fivetran cannot connect Private Link to an Azure Databricks workspace that's spun using default deployments.
Configure Azure Private Linklink
Create or select an Azure Databricks workspace that was spun using the Vnet injection deployment method.
Send your ID and Workspace URL to your Fivetran account manager. Fivetran sets up the Azure Private Link connection on our side.
Once your account manager confirms our setup was successful, approve our endpoint connection request. Setup is now complete.
Connect Databricks cluster link
TIP: If you want to set up a SQL warehouse, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Science & Engineering > Create > Cluster.
Enter a Cluster name of your choice.
Select the Cluster mode.
NOTE: For more information about cluster modes, see Databricks' documentation.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Setup external location step.
Connect SQL warehouse link
To connect to a SQL warehouse, do the following:
Log in to your Databricks workspace.
In the Databricks console, go to SQL > Create > SQL Warehouse.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
NOTE: Fivetran recommends starting with the 2X-Small cluster size and scaling up as your workload demands.
Choose your warehouse type:
- Serverless
- Pro
- Classic
NOTE: The Serverless option appears only if serverless is enabled in your account. For more information about warehouse types, see Databricks' documentation.
(Optional) If you are using the Unity Catalog feature, in the Advanced options section, enable the Unity Catalog toggle and set the Channel to Preview.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Fivetran supports the following external storages:
- Amazon S3 bucket
- Azure Blob storage container (recommended)
Configure Amazon S3 bucketlink
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
Configure Azure Blob storage containerlink
Create Azure storage accountlink
Create an Azure storage account by following the instructions in Azure Blob Storage's documentation. While creating the account, make sure you do the following:
In the Advanced tab, select the Require secure transfer for REST API operations and Enable storage account key access checkboxes.
In the Permitted scope for copy operations drop-down menu, select From any storage account.
In the Networking tab, select one of the following Network access options:
- If your Databricks destination is not hosted on Azure or if your storage container and destination are in different regions, select Enable public access from all networks.
- If your Databricks destination is hosted on Azure and if it is in the same region as your storage container, select Enable public access from selected virtual networks and IP addresses.
IMPORTANT: Ensure the virtual network or subnet where your Databricks workspace or cluster resides is included in the allowed list for public access on the Azure storage account.
In the Encryption tab, choose Microsoft-managed keys (MMK) as the Encryption type.
Find storage account name and access keylink
Log in to the Azure portal.
Go to your storage account.
On the navigation menu, click Access keys under Security + networking.
Make a note of the Storage account name and Key. You will need them to configure Fivetran.
IMPORTANT: As a security best practice, do not save your access key and account name anywhere in plain text that is accessible to others.
(Optional) Setup external location link
If you are using an external data storage, do the following:
With Unity Cataloglink
- To add a storage credential, follow Microsoft's Manage storage credential guide.
- To add an external location, follow Microsoft's Manage external location guide.
Fivetran uses the external location and storage credentials to write data on your cloud tenant.
Without Unity Cataloglink
Follow any one of the following Databricks' guides to provide us the permission to write data in Access Azure Data Lake Storage Gen2 or Azure Blob:
- Access Azure Data Lake Storage Gen2 or Blob Storage using OAuth 2.0 with an Azure service principal
- Access Azure Data Lake Storage Gen2 or Blob Storage using a SAS token
- Access Azure Data Lake Storage Gen2 or Blob Storage using the account key
Instead of executing the Python code provided in the above links, you can also assign the spark configs to the standard Databricks cluster you created in the Connect Databricks cluster step, or the SQL warehouse you created in the Connect SQL warehouse step.
Configure authentication typelink
Fivetran supports the following authentication types to connect to Databricks:
Databricks personal access token authentication: Fivetran supports this authentication type for:
- destinations that are connected to Fivetran using AWS PrivateLink or Azure Private Link
- destinations that were set up before April 24, 2024 and are not connected to Fivetran using AWS PrivateLink or Azure Private Link
OAuth machine-to-machine (M2M) authentication: Fivetran supports this authentication type for all destinations that are not connected to Fivetran using AWS PrivateLink or Azure Private Link.
By default, destinations set up before April 24, 2024 use Databricks personal access token authentication. However, if such a destination is not connected through AWS PrivateLink or Azure Private Link, you can change the authentication type to OAuth machine-to-machine (M2M) authentication.
NOTE: You cannot revert the change once you change the authentication type for a destination set up before April 24, 2024.
Configure Databricks personal access token authenticationlink
To use the Databricks personal access token authentication type, create a personal access token by following the instructions in Databricks' personal access token authentication documentation.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the schema:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Configure OAuth machine-to-machine (M2M) authenticationlink
To use the OAuth machine-to-machine (M2M) authentication type, create your OAuth Client ID and Secret by following the instructions in Databricks' OAuth machine-to-machine (M2M) authentication documentation.
Complete Fivetran configuration link
Log in to your Fivetran account.
Go to the Destinations page and click Add destination.
Enter a Destination name of your choice and then click Add.
Select Databricks as the destination type.
(Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 5:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
(Optional) Enter the Catalog name.
Enter the Server Hostname.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
Enter the Port number.
Enter the HTTP Path.
Specify the Authentication Type for your destination.
- If you are setting up a new destination on or after April 24, 2024, enter the OAuth 2.0 Client ID and OAuth 2.0 Secret you created in Step 7.
- If you are editing the connection details for an existing destination that was set up before April 24, 2024, select the Authentication Type of your choice. If you selected PERSONAL ACCESS TOKEN, enter the PERSONAL ACCESS TOKEN you created in Step 7. If you selected OAUTH2, enter the OAuth2 Client ID and OAuth2 Secret you created in Step 7.
IMPORTANT: We recommend that you select OAUTH 2.0 in this drop-down menu. However, if you set OAUTH 2.0 as the authentication type for your destination, you cannot revert it later.
(Optional) Choose the Databricks Deployment Cloud based on your infrastructure.
(Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can choose either of the following options:
- Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path - Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path. Depending on the Unity Catalog settings:- If Unity Catalog is disabled - we will use the default Databricks File System location registered with the cluster
- If Unity Catalog is enabled - we will use the root storage location in the Azure Data Lake Storage Gen2 container provided while creating a metastore
- Enter the External Location you want to use. We will create the Delta tables in the
Choose your Connection Method:
- Connect directly
- Connect via an SSH
- Connect via Private Link
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via Private Link option is only available for Business Critical accounts.
Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and cloud region as described in our Destinations documentation.
Choose your Time zone.
(Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
In addition, Fivetran automatically configures a Fivetran Platform Connector to transfer the connector logs and account metadata to a schema in this destination. The Fivetran Platform Connector enables you to monitor your connectors, track your usage, and audit changes. The connector sends all these details at the destination level.
IMPORTANT: If you are an Account Administrator, you can manually add the Fivetran Platform Connector on an account level so that it syncs all the metadata and logs for all the destinations in your account to a single destination. If an account-level Fivetran Platform Connector is already configured in a destination in your Fivetran account, then we don't add destination-level Fivetran Platform Connectors to the new destinations you create.
Databricks on GCP - Setup instructionslink
Learn how to set up your Databricks on GCP destination.
Expand for instructions
Choose a catalog link
IMPORTANT: If you don't use Unity Catalog, skip to the Connect SQL warehouse step. Fivetran will create schemas in the default catalog,
hive_metastore
.
If you use Unity Catalog, you need to decide which catalog to use with Fivetran. For example, you could create a catalog called fivetran
and organize tables from different connectors in it in separate schemas, like fivetran.salesforce
or fivetran.mixpanel
. If you need to set up Unity Catalog, follow Databricks' Get started using Unity Catalog guide.
Log in to your Databricks workspace.
Click Data in the Databricks console.
Choose a catalog in the Data Explorer.
Connect Databricks cluster link
TIP: If you want to set up a SQL warehouse, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Science & Engineering > Create > Cluster.
Enter a Cluster name of your choice.
Select the Cluster mode.
NOTE: For more information about cluster modes, see Databricks' documentation.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Setup external location step.
Connect SQL warehouse link
To connect to a SQL warehouse, do the following:
Log in to your Databricks workspace.
In the Databricks console, go to SQL > Create > SQL Warehouse.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
NOTE: Fivetran recommends starting with the 2X-Small cluster size and scaling up as your workload demands.
Choose your warehouse type:
- Serverless
- Pro
- Classic
NOTE: The Serverless option appears only if serverless is enabled in your account. For more information about warehouse types, see Databricks' documentation.
(Optional) If you are using the Unity Catalog feature, in the Advanced options section, enable the Unity Catalog toggle and set the Channel to Preview.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Fivetran supports the following external storages:
Configure Amazon S3 bucketlink
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
Configure Azure Blob storage containerlink
Create Azure storage accountlink
Create an Azure storage account by following the instructions in Azure Blob Storage's documentation. While creating the account, make sure you do the following:
In the Advanced tab, select the Require secure transfer for REST API operations and Enable storage account key access checkboxes.
In the Permitted scope for copy operations drop-down menu, select From any storage account.
In the Networking tab, select one of the following Network access options:
- If your Databricks destination is not hosted on Azure or if your storage container and destination are in different regions, select Enable public access from all networks.
- If your Databricks destination is hosted on Azure and if it is in the same region as your storage container, select Enable public access from selected virtual networks and IP addresses.
IMPORTANT: Ensure the virtual network or subnet where your Databricks workspace or cluster resides is included in the allowed list for public access on the Azure storage account.
In the Encryption tab, choose Microsoft-managed keys (MMK) as the Encryption type.
Find storage account name and access keylink
Log in to the Azure portal.
Go to your storage account.
On the navigation menu, click Access keys under Security + networking.
Make a note of the Storage account name and Key. You will need them to configure Fivetran.
IMPORTANT: As a security best practice, do not save your access key and account name anywhere in plain text that is accessible to others.
(Optional) Setup external location link
If you are using an external data storage, do the following:
With Unity Cataloglink
- To add a storage credential, follow Microsoft's Manage storage credential guide.
- To add an external location, follow Microsoft's Manage external location guide.
Fivetran uses the external location and storage credentials to write data on your cloud tenant.
Without Unity Cataloglink
Follow Databricks' documentation to provide us the permissions necessary to write data to Google Cloud Storage.
Configure authentication typelink
Fivetran supports the following authentication types to connect to Databricks:
Databricks personal access token authentication: Fivetran supports this authentication type for:
- destinations that are connected to Fivetran using AWS PrivateLink or Azure Private Link
- destinations that were set up before April 24, 2024 and are not connected to Fivetran using AWS PrivateLink or Azure Private Link
OAuth machine-to-machine (M2M) authentication: Fivetran supports this authentication type for all destinations that are not connected to Fivetran using AWS PrivateLink or Azure Private Link.
By default, destinations set up before April 24, 2024 use Databricks personal access token authentication. However, if such a destination is not connected through AWS PrivateLink or Azure Private Link, you can change the authentication type to OAuth machine-to-machine (M2M) authentication.
NOTE: You cannot revert the change once you change the authentication type for a destination set up before April 24, 2024.
Configure Databricks personal access token authenticationlink
To use the Databricks personal access token authentication type, create a personal access token by following the instructions in Databricks' personal access token authentication documentation.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the schema:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Configure OAuth machine-to-machine (M2M) authenticationlink
To use the OAuth machine-to-machine (M2M) authentication type, create your OAuth Client ID and Secret by following the instructions in Databricks' OAuth machine-to-machine (M2M) authentication documentation.
Complete Fivetran configuration link
- Log in to your Fivetran account.
- Go to the Destinations page and click Add destination.
- Enter a Destination name of your choice and then click Add.
- Select Databricks as the destination type.
- (Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 4:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
- Select the Connection Method. You can choose to either Connect directly, or Connect via PrivateLink.
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via Private Link option is only available for Business Critical accounts.
- (Optional) Enter the Catalog name.
- Enter the Server Hostname.
- Enter the Port number.
- Enter the HTTP Path.
- Specify the Authentication Type for your destination.
- If you are setting up a new destination on or after April 24, 2024, enter the OAuth 2.0 Client ID and OAuth 2.0 Secret you created in Step 6.
- If you are editing the connection details for an existing destination that was set up before April 24, 2024, select the Authentication Type of your choice. If you selected PERSONAL ACCESS TOKEN, enter the PERSONAL ACCESS TOKEN you created in Step 6. If you selected OAUTH2, enter the OAuth2 Client ID and OAuth2 Secret you created in Step 6.
IMPORTANT: We recommend that you select OAUTH 2.0 in this drop-down menu. However, if you set OAUTH 2.0 as the authentication type for your destination, you cannot revert it later.
- (Optional) Choose the Databricks Deployment Cloud based on your infrastructure.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
If you select Connect via Private Link as the connection method, the Databricks Deployment Cloud field will be populated automatically after you create the destination. - (Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can choose either of the following options:
- Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path - Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path. Depending on the Unity Catalog settings:- If Unity Catalog is disabled - we will use the default Databricks File System location registered with the cluster
- If Unity Catalog is enabled - we will use the root storage location in the Google Cloud Storage bucket provided while creating a metastore
- Enter the External Location you want to use. We will create the Delta tables in the
- Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and cloud region as described in our Destinations documentation.
- Choose your Time zone.
- (Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
- Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
In addition, Fivetran automatically configures a Fivetran Platform Connector to transfer the connector logs and account metadata to a schema in this destination. The Fivetran Platform Connector enables you to monitor your connectors, track your usage, and audit changes. The connector sends all these details at the destination level.
IMPORTANT: If you are an Account Administrator, you can manually add the Fivetran Platform Connector on an account level so that it syncs all the metadata and logs for all the destinations in your account to a single destination. If an account-level Fivetran Platform Connector is already configured in a destination in your Fivetran account, then we don't add destination-level Fivetran Platform Connectors to the new destinations you create.
Setup testslink
Fivetran performs the following Databricks connection tests:
The Connection test checks if we can connect to the Databricks cluster through Java Database Connectivity (JDBC) using the credentials you provided in the setup form.
The Check Version Compatibility test verifies the Databricks cluster version's compatibility with Fivetran.
The Check Cluster Configuration test validates the Databricks cluster's environment variables and the spark configuration for standard clusters with DBR version < 9.1.
The Validate Permissions test checks if we have the necessary READ/WRITE permissions to
CREATE
,ALTER
, orDROP
tables in the database. The test also checks if we have the permissions to copy data from Fivetran's external AWS S3 staging bucket.NOTE: The tests may take a couple of minutes to finish running.
Related articleslink
description Destination Overview
settings API Destination Configuration