Skip to main content

Manage Databricks Data Sources

This section describes ALTR's capabilities for managing Databricks data sources. ALTR must be connected to a data source in order to enforce data access governance and advanced data protection on sensitive data.

When connecting your account to ALTR, you will need the following information:

  • Workspace Hostname

  • Service Principal ID

  • Cluster ID

  • OAuth Secret

For assistance locating these fields in your Databricks account, read our documentation.

Note

The ideal scenario is to have one ALTR organization (tenant) per metastore, it is technically possible to have two organizations connecting the same metastore.

To connect a Databricks metastore:

  1. Select Data ConfigurationData Sources in the Navigation menu.

  2. Click Add Data Source.

  3. On the Databricks card, click Add Data Source.

  4. Enter a user-friendly Display Name connection.

  5. Enter the Workspace Hostname.

  6. Enter the Service Principal ID.

  7. Enter the Cluster ID.

    Note

    Once the metastore is connected, you are unable to edit the Cluster ID. If you need to update the Cluster ID, disconnect and reconnect the metastore to ALTR.

  8. Enter the OAuth Secret.

  9. (Optional) Click Run Data Classification Scan to classify the metastore.

    Note

    Classification users Google DLP to scan all catalogs within the metastore that are accessible by the principal ID to identify sensitive data. Learn more.

  10. Click Connect Data Source.

Remove a Databricks data source from ALTR if your service user is having problems or some other issue has occurred with the data source.

To remove a data source:

  1. Select Data ConfigurationData Sources in the Navigation menu.

  2. Select the data source you wish to disconnect.

  3. Click the Remove Data Source button. The process to remove a data source can take up to several minutes to complete.