Skip to main content

Configure a Databricks Account for ALTR Governance

ALTR implements tag-based governance on your Databricks Unity Catalog by connecting to a Databricks workspace as a service principal, creating a catalog to house UDFs and tables necessary for affecting governance, and then adding masks to columns based on the tags on those columns.

At a high level, to configure a Databricks account for ALTR governance:

  1. Create a service principal for ALTR.

  2. Assign usage of that service principal to a workspace with a compute cluster.

  3. Assign the ALTR service principal admin access to the metastore housing Unity Catalog so ALTR can see all objects in your Unity Catalog.

Prerequisites

Before getting started with configuring a Databricks account for ALTR governance:

  1. Create a text file with the following items. As you create objects in your Databricks account, keep a record of these values, which ALTR needs in order to connect.

    Metastore Admin Group Name:       
    ALTR Service Principal Name:      
    ALTR Service Principal ID:       
    ALTR Service Principal Secret:    
    Databricks Workspace URL:         
    Databricks Workspace Compute ID:   
  2. Login to your Databricks account management portal as an administrator. The portal is typically accessed at https://accounts.cloud.databricks.com.

ALTR attaches permissions to groups instead of directly to service principals to allow greater flexibility. This group must ultimately be an admin on the metastore(s) you wish ALTR to govern, and this group owns Unity Catalog objects ALTR needs to affect governance.

To create the required groups:

  1. Go to User managementGroupsAdd group.

  2. Enter altr-service-group for the Group name. This must be the group name.

  3. Add another group named metastore-admins. You may name this group anything you’d like, but this documentation assumes you named it metastore-admins. Record the name you chose in the text file as the Metastore Admin Group Name.

The ALTR service principal is a machine-to-machine Databricks service principal used by ALTR to programmatically manage data access on Unity Catalog objects.

To create the ALTR service principal:

  1. Select User managementService PrincipalsAdd service principal.

  2. Enter altr-service-principal for the Service principal name. You may name this service principal anything you’d like, but this documentation assumes you named it altr-service-principal. Record the name you chose in the text file as the ALTR Service Principal Name.

  3. Click Add. You are automatically returned to User managementService Principals when this operation completes.

  4. Click altr-service-principal (ALTR Service Principal Name).

  5. Under OAuth secrets, click Generate Secret.

  6. Copy both the Secret and the Client ID shown in the modal to the text file as ALTR Service Principal Secret and ALTR Service Principal ID, respectively. Treat this information as you would a username and password.

To permission the ALTR service group:

  1. Select User managementGroups.

  2. Click the group altr-service-group.

  3. Click Add members.

  4. Add altr-service-principal (ALTR Service Principal Name) to the group.

  5. Click the group metastore-admins (Metastore Admin Group Name).

  6. Click Add members.

  7. Add altr-service-group to the group.

  8. Add to the group all extant users or groups that are presently metastore admins for the metastore(s) you wish ALTR to govern.

You will often hear “Unity Catalog” used synonymously with “metastore.” Formally, a “metastore” is the physical storage (e.g., AWS S3 bucket) where a Unity Catalog is written. You may only have one Unity Catalog instance per metastore, hence the frequent interchanging of terms.

To permission the metastore:

  1. Go to Catalog.

  2. Click the metastore you wish ALTR to govern.

  3. Under Metastore Admin, click Edit.

  4. Set the metastore-admins (Metastore Admin Group Name) group as the metastore admin. Note from previous configuration, this group contains both altr-service-group and any other users or groups that were Metastore admins before configuring your Databricks account for ALTR.

To configure the Databricks workspace:

  1. Click Workspaces.

  2. Create or select a workspace for ALTR to connect to. The workspace must be attached to the metastore you wish ALTR to govern.

  3. Click that workspace Name.

  4. Record the workspace URL as the Databricks Workspace URL. This URL looks similar to https://dbc-49a6d9f0-2a45.cloud.databricks.com.

  5. Click Permissions.

  6. Click Add permissions.

  7. Add altr-service-group as a User.

  8. Click Save.

ALTR runs a python job to set column masks and create the necessary ALTR catalog, tables, and UDFs for governance. Using the ALTR service principal, this job is programmatically created and launched on the workspace compute cluster whenever you choose to apply a tag-based policy to your Unity Catalog.

To configure the Databricks workspace compute:

  1. Connect to the workspace as a workspace administrator.

  2. Go to Compute.

  3. Create or select an All-purpose compute cluster. You may use any compute cluster you wish, as long as (1) it can run python 3.10 or newer and (2) it has Unity Catalog enabled. The following is a suggested cluster configuration:

    1. Policy: Shared Compute

    2. Databricks runtime version: 15.4 LTS (Scala 2.212, Spark 3.5.0)

    3. Use Photon Acceleration

    4. Worker type: m5d.large ; Min workers = 1; max workers = 2

    5. Driver type: Same as worker

    6. Terminate after 30 minutes of inactivity

    7. Instance Profile: None (or as specified by your data administrator)

  4. Click the cluster's Name.

  5. In the Configuration tab, scroll to Tags.

  6. Expand the section Automatically added tags.

  7. Locate the field ClusterId and save its value as Databricks Workspace Compute ID. The value should look similar to 062a-203821-qo33e4jb.

  8. Click Cancel to exit the JSON view.

  9. Click More ⋯ in the top right again.

  10. Click Permissions.

  11. Add altr-service-group as Can Restart.

  12. Click Save.

To add your Databricks datasource to ALTR, use the Management API. You need the following information that was recorded throughout the permissioning in order to connect your Databricks account with ALTR via the ALTR API. It should look similar to:

ALTR Service Principal ID:        38fa12ba-cd2c-4a50-931e-103c0b70e61d
ALTR Service Principal Secret:    dose**********************
Databricks Workspace URL:         https://dbc-49a6d9f0-2a45.cloud.databricks.com
Databricks Workspace Compute ID:  062a-203821-qo33e4jb

Note

A few items to note:

  • Neither ALTR Service Principal Name nor Metastore Admin Group Name are needed for connecting to ALTR; they are needed only while configuring Databricks to prepare for ALTR connection.

  • ALTR uses the ALTR Service Principal ID instead of the ALTR Service Principal Name to connect to your Databricks workspace.