Skip to main content

Configure a Databricks Account for ALTR Governance

ALTR implements tag-based governance on your Databricks Unity Catalog by connecting to a Databricks workspace as a service principal, creating a catalog to house UDFs and tables necessary for affecting governance, and then adding masks to columns based on the tags on those columns.

Requirements

The following are requirements for connecting Databricks to ALTR:

  • Databricks Premium Tier

  • Databricks on AWS

  • Unity Catalog

Known Limitations

The following Databricks features are not supported:

  • Databricks on Azure

  • Databricks on GCP

  • Serverless Compute

  • Hive Catalog

Prerequisites

Before getting started with configuring a Databricks account for ALTR governance:

  1. Create a text file with the following items. You will need these as you configure your Databricks account and then when you connect ALTR to your Databricks account.

    Existing Metastore Admin Name:
    Metastore Admin Group Name:
    ALTR Service Principal Name:
    ALTR Service Principal ID:
    ALTR Service Principal OAuth Secret:
    Databricks Workspace Hostname:
    Databricks Workspace Cluster ID: 
  2. You must have a Databricks user with both Databricks account admin privilege and workspace admin privilege on the workspace you intend to allow ALTR to use to perform actions on your metastore.

    1. The Databricks account portal is typically accessed from https://accounts.cloud.databricks.com.

      Note

      Your account portal URL may be different if you have a special Databricks deployment, or a GCP or Azure deployment. In these cases, your Databricks administrator must provide you with your Databricks account portal URL.

    2. The Databricks workspace portal is typically accessed from a URL that looks similar to https://dbc-49a6d9f0-2a45.cloud.databricks.com.

To create the required groups:

  1. From the Databricks account portal, go to User managementGroupsAdd group.

  2. Enter altr-service-group for the Group name. This must be the group name.

  3. Add another group named metastore-admins. You may name this group anything you’d like, but this documentation assumes you named it metastore-admins. Record the name you chose in the text file as the Metastore Admin Group Name.

The ALTR Service Principal is a machine-to-machine Databricks service principal used by ALTR to programmatically manage data access on Unity Catalog objects.

To create the ALTR service principal:

  1. From the Databricks account portal, select User managementService PrincipalsAdd service principal.

  2. Enter altr-service-principal for the Service principal name. You may name this service principal anything you’d like, but this documentation assumes you named it altr-service-principal. Record the name you chose in the text file as the ALTR Service Principal Name.

  3. Click Add. You are automatically returned to User managementService Principals when this operation completes.

  4. Click altr-service-principal (ALTR Service Principal Name).

  5. Under OAuth secrets, click Generate Secret. A modal will appear.

  6. Choose an appropriate expiration time consistent with your company’s security policies. Note that once this secret expires, the ALTR integration will no longer function until you generate a new secret and update ALTR with that new secret, so set the expiration time as far into the future as possible.

  7. Copy the OAuth secret (in the Secret field) and the service principal ID (in the Client ID field) shown in the modal to the text file as ALTR Service Principal OAuth Secret and ALTR Service Principal ID, respectively.

  8. Go to Roles.

  9. Enable Account Admin.

The ALTR Service Group must be a metastore admin in order to see and operate on all objects in Unity Catalog. You likely already have a metastore admin designated. Since only one metastore admin can be designated, we must set a new group (the metastore-admins group created previously) that includes both the ALTR Service Group and your existing metastore admin as the new metastore admin.

First, record your existing metastore admin:

  1. From the Databricks account portal, select Catalog{your metastore}.

  2. Record the value Metastore Admin as Existing Metastore Admin Name in the text file.

Next, put the ALTR Service Principal into the ALTR Service Group:

  1. Select User managementGroups.

  2. Click the group altr-service-group.

  3. Click Add members.

  4. Add altr-service-principal (ALTR Service Principal Name) to the group.

Then, put the ALTR Service Group and your existing metastore admin into the metastore-admins group:

  1. Return to User managementGroups.

  2. Click the group metastore-admins (Metastore Admin Group Name).

  3. Click Add members.

  4. Add altr-service-group to the group.

  5. Add the existing metastore admin (Existing Metastore Admin Name) you recorded previously to the group.

Finally, set the group metastore-admins as the new metastore admin:

  1. Select Catalog.

  2. Click the metastore you wish ALTR to govern.

  3. Under Metastore Admin, click Edit.

  4. Set the metastore-admins (Metastore Admin Group Name) group as the metastore admin.

To configure the Databricks workspace:

  1. From the Databricks account portal, click Workspaces.

  2. Create or select a workspace for ALTR to connect to. Note the workspace must be attached to the metastore you wish ALTR to govern.

  3. Click that workspace Name.

  4. Record the workspace URL as the Databricks Workspace URL. (You may use either the whole URL or just the hostname). This URL looks similar to https://dbc-49a6d9f0-2a45.cloud.databricks.com.

  5. Click Permissions.

  6. Click Add permissions.

  7. Add altr-service-group as a User.

  8. Click Save.

ALTR runs a python job to set column masks and create the necessary ALTR catalog, tables, and UDFs for governance. Using the ALTR service principal, this job is programmatically created and launched on the workspace compute cluster whenever you choose to apply a tag-based policy to your Unity Catalog.

To configure the Databricks workspace compute:

  1. Using the URL you recorded as Databricks Workspace Hostname , login to the workspace portal as a user with admin privilege on that workspace.

  2. Go to Compute.

  3. Create or select an All-purpose compute cluster. You may use any compute cluster you wish, as long as (1) it can run python 3.10 or newer and (2) it has Unity Catalog enabled. The following is a suggested cluster configuration:

    1. Policy: Shared Compute

    2. Databricks runtime version: 15.4 LTS (Scala 2.212, Spark 3.5.0)

    3. Use Photon Acceleration

    4. Worker type: m5d.large ; Min workers = 1; max workers = 2

    5. Driver type: Same as worker

    6. Terminate after 30 minutes of inactivity

    7. Instance Profile: None (or as specified by your Databricks administrator)

  4. Click the cluster’s Name.

  5. In the Configuration tab, scroll to Tags.

  6. Expand the section Automatically added tags.

  7. Locate the field ClusterId and save its value as Databricks Workspace Cluster ID. The value should look similar to 062a-203821-qo33e4jb.

  8. Click More ⋮ in the top right again.

  9. Click Permissions.

  10. Add altr-service-group as Can Restart.

  11. Click Save.

To connect your Databricks datasource to ALTR, use the Management API. You need the following information that was recorded throughout the permissioning in order to connect your Databricks account with ALTR via the ALTR API. It should look similar to:

ALTR Service Principal ID:        38fa12ba-cd2c-4a50-931e-103c0b70e61d
ALTR Service Principal Secret:    dose**********************
Databricks Workspace URL:         https://dbc-49a6d9f0-2a45.cloud.databricks.com
Databricks Workspace Compute ID:  062a-203821-qo33e4jb

To use the user interface to connect Databricks to ALTR, read our documentation.