Skip to main content

Databricks Connection Overview

ALTR connects to Databricks using two methods: a service principal and an API.

The service principal is a login to the customer’s Databricks account that allows ALTR to create and manage objects for tag-based policy.

The API enables Databricks to interact with ALTR’s cloud-based access control engine to make access decisions for tag-based policy.

Unlike other data security platforms, ALTR is a true cloud-to-cloud SaaS solution. There's no proxy to install or custom views to maintain. Instead, ALTR integrates directly with Databricks via API, simplifying deployment and reducing latency.

Data Source Connections

When connecting a Databricks metastore to ALTR, there are a few requirements and limitations.

Requirements

  • Databricks Premium Tier

  • Databricks on AWS

  • Unity Catalog

Not Supported

  • Databricks on Azure

  • Databricks on GCP

  • Serverless Compute

  • Hive Catalog

Tag Policy

ALTR supports tag-based policy on your Databricks Unity Catalog by connecting to a Databricks workspace as a service principal, creating a catalog to house UDFs and tables necessary for affecting governance, and then adding masks to columns based on the tags on those columns.

Databricks tag policy uses native masking where data access is controlled by ALTR using only Databricks masking policies instead of external functions.

Classification

For Databricks, ALTR supports Google DLP data classification, where ALTR randomly samples a small amount of data from each column in the selected database. Columns are sampled independently to reduce re-identification risk. ALTR does not store sampled data and only performs sampling when explicitly requested through a Google DLP classification.

Performing a Google DLP classification on a Databricks metastore activates a Databricks compute. The length of time this compute is active depends on the number of columns present in the metastore.