Databricks Setup Guidelink
Follow our setup guide to connect Databricks to Fivetran.
IMPORTANT: If you used Databricks Partner Connect to set up your Fivetran account, you don't need to follow the setup guide instructions because you already have a connection to Databricks. We strongly recommend using Databricks Partner Connect to set up your destination. Learn how in Databricks' Fivetran Partner Connect documentation. To connect your Databricks workspace to Fivetran using Partner Connect, make sure you meet Databricks' requirements.
Supported cloud platformslink
You can set up your Databricks destination on the following cloud platforms:
Prerequisiteslink
To connect Databricks to Fivetran, you need the following:
- a Databricks account
- a Fivetran account with permission to add destinations
- Unity Catalog enabled on your Databricks Workspace. Unity Catalog is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards in your lakehouse on any cloud. We strongly recommend using Fivetran with Unity Catalog as it simplifies access control and sharing of tables created by Fivetran. Legacy deployments can continue to use Databricks without Unity Catalog.
- SQL warehouses. SQL warehouses are optimized for data ingestion and analytics workloads, start and shut down rapidly and are automatically upgraded with the latest enhancements by Databricks. Legacy deployments can continue to use Databricks clusters with Databricks Runtime v7.0+.
Databricks on AWS - Setup instructionslink
Choose a catalog link
If you don't use Unity Catalog, skip ahead to the Connect SQL warehouse step. Fivetran will create schemas in the default catalog,
hive_metastore
.
If you use Unity Catalog, you need to decide which catalog to use with Fivetran. For example, you could create a catalog called fivetran
and organize tables from different connectors in it in separate schemas, like fivetran.salesforce
or fivetran.mixpanel
. If you need to set up Unity Catalog, follow Databricks' Get started using Unity Catalog guide.
Log in to your Databricks workspace.
Click Data in the Databricks console.
Choose a catalog in the Data Explorer.
Connect SQL warehouse link
In the Databricks console, go to SQL > SQL warehouses > Create SQL warehouse. If you want to select an existing SQL warehouse, skip to step 5 in this section.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Create personal access token link
Fivetran uses a secure token to connect to Databricks. Follow Databricks' token management guide.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the schema:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/{prefix_path}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket and{prefix_path}
with the prefix path of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
(Optional) Connect using AWS PrivateLink Beta link
IMPORTANT: Do not perform this step if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use AWS PrivateLink.
You can connect Fivetran to your Databricks destination using AWS PrivateLink. AWS PrivateLink allows VPCs and AWS-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. PrivateLink is the most secure connection method. Learn more in AWS PrivateLink's documentation.
How it works:
Fivetran accesses the data plane in your AWS account using the control plane network in Databricks' account.
You set up a back-end AWS PrivateLink connection between your AWS account and Databricks' AWS account (shown as
(Workspace 1/2) Link-1
in the diagram above).Fivetran creates and maintains a front-end AWS PrivateLink connection between Fivetran's AWS account and Databricks' AWS account (shown as
Regional - Link-2
in the diagram above).
Prerequisiteslink
To set up AWS PrivateLink, you need:
- A Fivetran instance configured to run in AWS
- A Databricks destination in one of our supported regions
- All of Databricks' requirements
Configure AWS PrivateLinklink
Follow Databricks' Enable AWS PrivateLink documentation to enable private connectivity for your workspaces. Your workspaces must have the following:
- A registered back-end VPC endpoint for secure cluster connectivity relay
- A registered back-end VPC endpoint for REST APIs
- A PAS object with access to Fivetran's VPC endpoints
- If the Private Access Level on the PAS object is set to Account, a Fivetran VPC endpoint (for the applicable AWS region) that's registered once per account
- If the Private Access Level on the PAS object is set to Endpoint, a Fivetran VPC endpoint (for applicable AWS region) that's registered using the
allowed_vpc_endpoint_ids
property
Register Fivetran endpoint detailslink
Register the Fivetran endpoint for the applicable AWS region with your Databricks workspaces. We cannot access your workspaces until you do so.
AWS Region | VPC endpoint |
---|---|
ap-south-1 Asia Pacific (Mumbai) | vpce-089f13c9231c2b729 |
ap-southeast-2 Asia Pacific (Sydney) | vpce-0e5f79a1613d0cf05 |
ca-central-1 Canada (Central) | vpce-09f0049f9a92177f1 |
eu-central-1 Europe (Frankfurt) | vpce-049699737170c880d |
eu-west-1 Europe (Ireland) | vpce-0b32cb6c08f6fe0df |
eu-west-2 Europe (London) | vpce-03fde3e4804f537eb |
us-east-1 US East (N. Virginia) | vpce-0ff9bd04153060180 |
us-east-2 US East (Ohio) | vpce-05153aa99bf7a4575 |
us-west-2 US West (Oregon) | vpce-0884ff0f23dcbf0dc |
IMPORTANT: Regardless of your Fivetran subscription plan, if you have enabled back-end AWS PrivateLink connection between your AWS account and Databricks' AWS account (shown as
(Workspace 1/2) Link-1
in the diagram above), you must register the Fivetran endpoint (for the applicable AWS region) to avoid connection failures.
Complete Fivetran configuration link
Log in to your Fivetran account.
Go to the Destinations page and click Add destination.
Enter a Destination name of your choice and then click Add.
Select Databricks as the destination type.
(Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 4:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
(Unity Catalog only) Enter the Catalog name. This is the name of the catalog in Unity Catalog, such as
fivetran
ormain
.Enter the Server Hostname.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
Enter the Port number.
Enter the HTTP Path.
Enter the Personal Access Token you created in the Create personal access token step.
(Optional) Choose the Databricks Deployment Cloud based on the cloud platform you are using with Databricks.
(Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can:
Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path.Use the default Databricks File System location registered with the cluster. Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path.
Choose your Connection Method:
- Connect directly
- Connect via SSH
- Connect via PrivateLink
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via PrivateLink option is only available for Business Critical accounts.
Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and AWS region as described in our Destinations documentation.
Choose your Time zone.
(Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
(Optional) Storing data in external locations link
With Unity Cataloglink
You can customize where Fivetran stores Delta tables. If you use Unity Catalog, Fivetran creates managed tables in Databricks. Managed tables are stored in the root storage location that you configure when you create a metastore.
You can also instruct Fivetran to store tables in an external location managed by Unity Catalog. To do so, follow these steps:
- Follow Databricks' Manage storage credential guide to add a storage credential to Unity Catalog.
- Follow Databricks' Manage external location guide to add an external location to Unity Catalog.
- Enable Create Delta tables in an external location and specify the external location (for example,
s3://mybucket/myprefix
) as the value.
Without Unity Cataloglink
If you do not use Unity Catalog, you can still store Delta tables in a specific S3 bucket. First, you must create an AWS instance profile and associate it with Databricks compute.
See Databricks' Secure access to S3 buckets using instance profiles documentation. Perform the first four steps mentioned in the guide to create an instance profile.
In the Databricks console, click Settings > SQL Admin Console.
In the Settings window, select SQL Warehouse Settings.
Select the Instance profile you created above.
(Optional) Connect Databricks cluster link
TIP: We strongly recommend using Databricks SQL warehouses with Fivetran. To learn more, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Engineering > Cluster > Create Cluster..
Enter a Cluster name of your choice.
Set the Databricks Runtime Version to the latest LTS release. At minimum, you must choose v7.3+.
Select the Cluster mode.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
(Optional) If you are using an external data storage and have disabled the Unity Catalog feature, select the Instance profile you created in the Create instance profile step.
In the Advanced Options section, select Spark.
If you have set the Databricks Runtime Version to below 9.1, paste the following code in Spark config field:
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl.disable.cache true spark.hadoop.fs.s3.impl.disable.cache true spark.hadoop.fs.s3a.impl.disable.cache true spark.hadoop.fs.s3.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Add external location and storage credentials step.
Databricks on Azure - Setup instructionslink
(Optional) Configure Unity Catalog link
IMPORTANT: Perform the following steps if you want to use the Unity Catalog feature or an external data storage.
Create a storage accountlink
To create an Azure Blob Storage account, follow Microsoft's Create a storage account guide.
To create an Azure Data Lake Storage Gen2 storage account, follow Microsoft's Create a storage account to use with Azure Data Lake Storage Gen2 guide.
Create a containerlink
To create a container in Azure Blob storage, follow Microsoft's Create a container guide.
To create a container in ADLS Gen2 storage, follow Microsoft's Create a container guide.
Configure a managed identity for Unity Cataloglink
IMPORTANT: Perform this step if you have enabled the Unity Catalog feature and want to access the metastore using a managed identity.
You can configure Unity Catalog (Preview) to use an Azure managed identity to access storage containers.
To configure a managed identity for Unity Catalog, follow Databricks' Configure a managed identity for Unity Catalog guide.
Create a metastore and attach workspacelink
IMPORTANT: Perform this step if you have enabled the Unity Catalog feature.
To create a metastore and attach workspace, follow Microsoft's Create the metastore guide.
(Optional) Connect using Azure Private Link Beta link
IMPORTANT: Do not perform this step if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use Azure Private Link.
You can connect Fivetran to your Databricks destination using Azure Private Link. Azure Private Link allows VNet and Azure-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. Learn more in Microsoft's Azure Private Link documentation.
Prerequisiteslink
To set up Azure Private Link, you need:
- A Fivetran instance configured to run in Azure
- A Databricks destination in one of our supported regions
- An Azure Databricks workspace that is in your own virtual network (Vnet injection). Learn more in Azure's Create an Azure Databricks workspace in your own Virtual Network documentation.
IMPORTANT: Fivetran cannot connect Private Link to an Azure Databricks workspace that's spun using default deployments.
Configure Azure Private Linklink
Create or select an Azure Databricks workspace that was spun using the Vnet injection deployment method.
Send your ID and Workspace URL to your Fivetran account manager. Fivetran sets up the Azure Private Link connection on our side.
Once your account manager confirms our setup was successful, approve our endpoint connection request. Setup is now complete.
Connect Databricks cluster link
TIP: If you want to set up a SQL warehouse, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Science & Engineering > Create > Cluster.
Enter a Cluster name of your choice.
Select the Cluster mode.
NOTE: For more information about cluster modes, see Databricks' documentation.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Setup external location step.
Connect SQL warehouse link
To connect to a SQL warehouse, do the following:
Log in to your Databricks workspace.
In the Databricks console, go to SQL > Create > SQL Warehouse.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
NOTE: Fivetran recommends starting with the 2X-Small cluster size and scaling up as your workload demands.
Choose your warehouse type:
- Serverless
- Pro
- Classic
NOTE: The Serverless option appears only if serverless is enabled in your account. For more information about warehouse types, see Databricks' documentation.
(Optional) If you are using the Unity Catalog feature, in the Advanced options section, enable the Unity Catalog toggle and set the Channel to Preview.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/{prefix_path}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket and{prefix_path}
with the prefix path of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
(Optional) Setup external location link
If you are using an external data storage, do the following:
With Unity Cataloglink
- To add a storage credential, follow Microsoft's Manage storage credential guide.
- To add an external location, follow Microsoft's Manage external location guide.
Fivetran uses the external location and storage credentials to write data on your cloud tenant.
Without Unity Cataloglink
Follow any one of the following Databricks' guides to provide us the permission to write data in Access Azure Data Lake Storage Gen2 or Azure Blob:
- Access Azure Data Lake Storage Gen2 or Blob Storage using OAuth 2.0 with an Azure service principal
- Access Azure Data Lake Storage Gen2 or Blob Storage using a SAS token
- Access Azure Data Lake Storage Gen2 or Blob Storage using the account key
Instead of executing the Python code provided in the above links, you can also assign the spark configs to the standard Databricks cluster you created in the Connect Databricks cluster step, or the SQL warehouse you created in the Connect SQL warehouse step.
Create personal access token link
To create a new personal access token, follow Databricks' token management guide.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the schema:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Complete Fivetran configuration link
Log in to your Fivetran account.
Go to the Destinations page and click Add destination.
Enter a Destination name of your choice and then click Add.
Select Databricks as the destination type.
(Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 5:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
(Optional) Enter the Catalog name.
Enter the Server Hostname.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
Enter the Port number.
Enter the HTTP Path.
Enter the Personal Access Token you created.
(Optional) Choose the Databricks Deployment Cloud based on your infrastructure.
(Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can choose either of the following options:
- Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path - Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path. Depending on the Unity Catalog settings:- If Unity Catalog is disabled - we will use the default Databricks File System location registered with the cluster
- If Unity Catalog is enabled - we will use the root storage location in the Azure Data Lake Storage Gen2 container provided while creating a metastore
- Enter the External Location you want to use. We will create the Delta tables in the
Choose your Connection Method:
- Connect directly
- Connect via an SSH
- Connect via Private Link
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via Private Link option is only available for Business Critical accounts.
Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and cloud region as described in our Destinations documentation.
Choose your Time zone.
(Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
Click Save & Test.
Fivetran tests and validates the Databricks connection. On successful completion of the setup tests, you can sync your data using Fivetran connectors to the Databricks destination.
Databricks on GCP - Setup instructionslink
Choose a catalog link
IMPORTANT: If you don't use Unity Catalog, skip to the Connect SQL warehouse step. Fivetran will create schemas in the default catalog,
hive_metastore
.
If you use Unity Catalog, you need to decide which catalog to use with Fivetran. For example, you could create a catalog called fivetran
and organize tables from different connectors in it in separate schemas, like fivetran.salesforce
or fivetran.mixpanel
. If you need to set up Unity Catalog, follow Databricks' Get started using Unity Catalog guide.
Log in to your Databricks workspace.
Click Data in the Databricks console.
Choose a catalog in the Data Explorer.
Connect Databricks cluster link
TIP: If you want to set up a SQL warehouse, skip to the Connect SQL warehouse step.
To connect to a Databricks cluster, do the following:
Create a Databricks clusterlink
Log in to your Databricks workspace.
In the Databricks console, go to Data Science & Engineering > Create > Cluster.
Enter a Cluster name of your choice.
Select the Cluster mode.
NOTE: For more information about cluster modes, see Databricks' documentation.
Set the Databricks Runtime Version to 7.3 or later. (10.4 LTS Recommended)
(Optional) If you are using the Unity Catalog feature, in the Advanced Options window, in the Security mode drop-down menu, select either Single user or User isolation.
Click Create Cluster.
In the Advanced Options window, select JDBC/ODBC.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
For further instructions, skip to the Setup external location step.
Connect SQL warehouse link
To connect to a SQL warehouse, do the following:
Log in to your Databricks workspace.
In the Databricks console, go to SQL > Create > SQL Warehouse.
In the New SQL warehouse window, enter a Name for your warehouse.
Choose your Cluster Size and configure the other warehouse options.
NOTE: Fivetran recommends starting with the 2X-Small cluster size and scaling up as your workload demands.
Choose your warehouse type:
- Serverless
- Pro
- Classic
NOTE: The Serverless option appears only if serverless is enabled in your account. For more information about warehouse types, see Databricks' documentation.
(Optional) If you are using the Unity Catalog feature, in the Advanced options section, enable the Unity Catalog toggle and set the Channel to Preview.
Click Create.
Go to the Connection details tab.
Make a note of the following values. You will need them to configure Fivetran.
- Server Hostname
- Port
- HTTP Path
Configure external storage for Hybrid Deploymentlink
IMPORTANT: Skip to the next step if you want to use Fivetrtan's cloud environment to sync your data. Perform this step only if you want to use Hybrid Deployment for your data pipeline. You must have a Business Critical plan to use the Hybrid Deployment architecture.
Create Amazon S3 bucketlink
Create an S3 bucket by following the instructions in AWS's documentation.
Create IAM policy for S3 bucketlink
Log in to the Amazon IAM console.
Go to Policies, and then click Create policy.
Go to the JSON tab.
Copy the following policy and paste it in the JSON editor.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:DeleteObjectTagging", "s3:ReplicateObject", "s3:PutObject", "s3:GetObjectAcl", "s3:GetObject", "s3:DeleteObjectVersion", "s3:ListBucket", "s3:PutObjectTagging", "s3:DeleteObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::{your-bucket-name}/{prefix_path}/*", "arn:aws:s3:::{your-bucket-name}" ] } ] }
In the policy, replace
{your-bucket-name}
with the name of your S3 bucket and{prefix_path}
with the prefix path of your S3 bucket.Click Next.
Enter a Policy name.
Click Create policy.
Create AWS user for Fivetranlink
In the Amazon IAM console, go to Users, and then click Create user.
Enter a User name, and then click Next.
Select Attach policies directly.
Select the checkbox next to the policy you create in the Create IAM policy for S3 bucket step, and then click Next.
In the Review and create page, click Create user.
In the Users page, select the user you created.
Click Create access key.
Select Application running outside AWS, and then click Next.
Click Create access key.
Click Download .csv file to download the Access key ID and Secret access key to your local drive. You will need them to configure Fivetran.
(Optional) Setup external location link
If you are using an external data storage, do the following:
With Unity Cataloglink
- To add a storage credential, follow Microsoft's Manage storage credential guide.
- To add an external location, follow Microsoft's Manage external location guide.
Fivetran uses the external location and storage credentials to write data on your cloud tenant.
Without Unity Cataloglink
Follow Databricks' documentation to provide us the permissions necessary to write data to Google Cloud Storage.
Create personal access token link
To create a new personal access token, follow Databricks' token management guide.
IMPORTANT: Depending on whether or not you use Unity Catalog, ensure that the user or service principal you want to use to create your access token has the following privileges:
If you use Unity Catalog, the user or service principal must have the following privileges on the catalog:
- CREATE SCHEMA
- CREATE TABLE
- MODIFY
- SELECT
- USE CATALOG
- USE SCHEMA
If you do not use Unity Catalog, the user or service principal must have the following privileges on the default catalog:
- SELECT
- MODIFY
- READ_METADATA
- USAGE
- CREATE
When you grant a privilege on the catalog, it is automatically granted to all current and future schemas in the catalog. Similarly, the privileges that you grant on a schema are inherited by all current and future tables in the schema.
Complete Fivetran configuration link
- Log in to your Fivetran account.
- Go to the Destinations page and click Add destination.
- Enter a Destination name of your choice and then click Add.
- Select Databricks as the destination type.
- (Optional for Business Critical accounts) To use Hybrid Deployment, set the Enable local data processing toggle to ON, and then in the Select an existing local processing agent drop-down menu, select your local processing agent. If you want to configure and install a new agent, follow our installation instructions. Enter the following details of the S3 bucket you created in Step 4:
- S3 bucket name
- S3 bucket region
- AWS access key ID
- AWS secret access key
- Select the Connection Method. You can choose to either Connect directly, or Connect via PrivateLink.
NOTE: The Connection Method options do not appear if you set the Enable local data processing toggle to ON. The Connect via Private Link option is only available for Business Critical accounts.
- (Optional) Enter the Catalog name.
- Enter the Server Hostname.
- Enter the Port number.
- Enter the HTTP Path.
- Enter the Personal Access Token you created.
- (Optional) Choose the Databricks Deployment Cloud based on your infrastructure.
NOTE: If we auto-detect your Databricks Deployment Cloud, the Databricks Deployment Cloud field won't be visible in the setup form.
If you select Connect via Private Link as the connection method, the Databricks Deployment Cloud field will be populated automatically after you create the destination. - (Optional) Set the Create Delta tables in an external location toggle to ON to create Delta tables as external tables. You can choose either of the following options:
- Enter the External Location you want to use. We will create the Delta tables in the
{externallocation}/{schema}/{table}
path - Do not specify the external location. We will create the external Delta tables in the
/{schema}/{table}
path. Depending on the Unity Catalog settings:- If Unity Catalog is disabled - we will use the default Databricks File System location registered with the cluster
- If Unity Catalog is enabled - we will use the root storage location in the Google Cloud Storage bucket provided while creating a metastore
- Enter the External Location you want to use. We will create the Delta tables in the
- Choose the Data processing location. Depending on the plan you are on and your selected cloud service provider, you may also need to choose a Cloud service provider and cloud region as described in our Destinations documentation.
- Choose your Time zone.
- (Optional for Business Critical accounts) To enable regional failover, set the Use Failover toggle to ON, and then select your Failover Location and Failover Region. Make a note of the IP addresses of the secondary region and safelist these addresses in your firewall.
- Click Save & Test.
Setup testslink
Fivetran performs the following Databricks connection tests:
The Connection test checks if we can connect to the Databricks cluster through Java Database Connectivity (JDBC) using the credentials you provided in the setup form.
The Check Version Compatibility test verifies the Databricks cluster version's compatibility with Fivetran.
The Check Cluster Configuration test validates the Databricks cluster's environment variables and the spark configuration for standard clusters with DBR version < 9.1.
The Validate Permissions test checks if we have the necessary READ/WRITE permissions to
CREATE
,ALTER
, orDROP
tables in the database. The test also checks if we have the permissions to copy data from Fivetran's external AWS S3 staging bucket.NOTE: The tests may take a couple of minutes to finish running.
Related articleslink
description Destination Overview
settings API Destination Configuration