Databricks Connection Overview

ALTR connects to Databricks using two methods: a service principal and an API.

The service principal is a login to the customer’s Databricks account that allows ALTR to create and manage objects for tag-based policy.

The API enables Databricks to interact with ALTR’s cloud-based access control engine to make access decisions for tag-based policy.

Unlike other data security platforms, ALTR is a true cloud-to-cloud SaaS solution. There's no proxy to install or custom views to maintain. Instead, ALTR integrates directly with Databricks via API, simplifying deployment and reducing latency.

Data Source Connections

When connecting a Databricks metastore to ALTR, there are a few requirements and limitations.

Requirements

Databricks Premium Tier
Databricks on AWS
Unity Catalog

Not Supported

Databricks on Azure
Databricks on GCP
Serverless Compute
Hive Catalog

Tag Policy

ALTR supports tag-based policy on your Databricks Unity Catalog by connecting to a Databricks workspace as a service principal, creating a catalog to house UDFs and tables necessary for affecting governance, and then adding masks to columns based on the tags on those columns.

Databricks tag policy uses native masking where data access is controlled by ALTR using only Databricks masking policies instead of external functions.

Classification

For Databricks, ALTR supports Google DLP data classification, where ALTR randomly samples data from individual columns only, never from full rows. Columns are sampled independently to reduce re-identification risk. ALTR does not store sampled data and only performs sampling when explicitly requested through a Google DLP classification.

From the classification report, you can optionally assign Databrick tags to columns. Learn more about automatic tagging.

Performing a Google DLP classification on a Databricks metastore requires a Databricks compute. The length of time this compute is running depends on the number of columns present in the metastore.

In this section: