Unity Catalog Setup Instructions
Follow our setup instructions to integrate Azure Databricks Unity Catalog with your Azure Data Lake Storage (ADLS) destination.
Setup instructions
Create workspace
Login to the Azure portal.
Create a workspace by following Databricks' documentation.
Create Unity Catalog metastore
Create a metastore and assign a workspace to it by following Databricks' documentation.
Enable Unity Catalog for workspace
Enable Unity Catalog for your workspace by following Databricks’ documentation.
Configure external data storage
To configure your external storage in Databricks, do the following:
Create storage credentials
Log in to your Databricks workspace.
Go to Data Explorer > External Data.
Select Storage Credentials.
Click Add and then select Add a storage credential.
Select Service Principal.
Enter the Storage credential name of your choice.
Enter the Directory ID and Application ID of the service principal you created for your ADLS destination.
Enter the Client Secret you created for your ADLS destination.
Click Create.
NOTE: You can also configure Unity Catalog to use an Azure managed identity for authenticating your storage account.
Create external location
In the Data Explorer page, select External Locations.
Click Add and then select Add an external location.
Enter the External location name.
Enter your ADLS account URL.
Select the Storage credential you created in the Create storage credentials step.
Click Create.
Create notebook
Create a notebook by following the instructions in Databricks’ documentation.
NOTE: You can create the notebook in any folder within your workspace.
Create external tables
To create an external table, execute the following SQL query from the notebook you created in Step 5:
CREATE TABLE <catalog>.<schema>.<table>
USING delta
OPTIONS (
path 'abfss://<containerName>@<storageAccountName>.dfs.core.windows.net/<path-to-table>'
)
In the SQL query, replace the following placeholder values with your values:
Placeholder Value | Actual Value |
---|---|
<catalog> | Name of the catalog that contains the table. |
<schema> | Table schema name. |
<table_name> | Table name. |
<storageAccountName> | Your ADLS account name. |
<containerName> | Your ADLS container name. |
<path-to-table> | Path to the table within the ADLS container. |
NOTE: You can use the
decode(unhex(<column_name>), 'UTF-8')
clause in your query to fetch the decoded values in BINARY columns.