Databricks Connection Overview
ALTR connects to Databricks using two methods: a service principal and an API.
The service principal is a login to the customer’s Databricks account that allows ALTR to create and manage objects for tag-based policy.
The API enables Databricks to interact with ALTR’s cloud-based access control engine to make access decisions for tag-based policy.
Unlike other data security platforms, ALTR is a true cloud-to-cloud SaaS solution. There's no proxy to install or custom views to maintain. Instead, ALTR integrates directly with Databricks via API, simplifying deployment and reducing latency.
Data Source Connections
When connecting a Databricks metastore to ALTR, there are a few requirements and limitations.
Requirements
Databricks Premium Tier
Databricks on AWS
Unity Catalog
Not Supported
Databricks on Azure
Databricks on GCP
Serverless Compute
Hive Catalog
Tag Policy
ALTR supports tag-based policy on your Databricks Unity Catalog by connecting to a Databricks workspace as a service principal, creating a catalog to house UDFs and tables necessary for affecting governance, and then adding masks to columns based on the tags on those columns.
Databricks tag policy uses native masking where data access is controlled by ALTR using only Databricks masking policies instead of external functions.
Classification
For Databricks, ALTR supports Google DLP data classification, where ALTR randomly samples a small amount of data from each column in the selected database. Columns are sampled independently to reduce re-identification risk. ALTR does not store sampled data and only performs sampling when explicitly requested through a Google DLP classification.
Performing a Google DLP classification on a Databricks metastore activates a Databricks compute. The length of time this compute is active depends on the number of columns present in the metastore.