Skip to main content

Data Classification

Data classification in ALTR helps you identify potentially sensitive data across data sources. This process determines where specific types of information, such as personal, financial or regulated data, may exist.

After a classification scan completes, the results are available in a classification report. Use these results to understand how sensitive data is distributed across your data sources and to protect sensitive data by applying access controls.

ALTR supports three classification methods:

  • ALTR Native classification—uses regex-based classifiers you define or that are managed by ALTR

  • Snowflake classification—leverages Snowflake’s native classification capabilities

  • Google DLP classification—uses Google’s Data Loss Prevention service

Refer to the follow table for a high-level comparison of the available ALTR’s classification methods:

Method

Description

Supported Data Sources

Available Features

Limitations

ALTR Native

Matches custom or ALTR-managed classifier patterns against a data sample

Snowflake

  • Full customization of classification scans

  • Data remains in Snowflake and does not leave your Snowflake instance

  • Connect columns from the classification report

  • Pause and cancel classification scans

Does not support auto tagging

Snowflake

Uses Snowflake’s built-in classification functionality

Snowflake

  • Data remains in Snowflake and does not leave your Snowflake instance

  • Connect columns from the classification report

  • Supports auto tagging

  • Classification logic is defined by Snowflake and is not configurable by use case

  • Doesn’t support custom regex

Google DLP

Matches pre-defined rules against a data sample using Google Data Loss Prevention

  • Snowflake

  • Databricks

  • Delivers a good base set of classifiers with their classification engine

  • Supports auto tagging

  • Connect columns from the classification report

  • Data has to leave your database environment and ALTR in order to go to Google

  • Limited customization beyond predefined infoTypes

  • Doesn’t support custom regex

Data classification may take several minutes to run based on the number of columns in the data source. An email is sent to administrators when the classification report is ready (Snowflake and Google DLP, only).

Note

Classifying data in a Snowflake data source using ALTR Native, Google DLP or Snowflake Classification activates a Snowflake warehouse. The length of time this warehouse is active depends on the number of columns in the data source.

Classifying data in a Databricks metastore using Google DLP activates a Databricks compute. The length of time this compute is active depends on the number of columns in the metastore. Currently, the only classification method supported for Databricks is Google DLP.

Classifying data runs a classification scan and generates a classification report.

To classify data:

  1. Connect a Snowflake or a Databricks data source to ALTR.

  2. If using ALTR Native Classification, create a collection with at least one classifier. Learn more.

  3. Select Data ClassificationClassification Reports in the Navigation menu.

  4. Click Classify Data to run a classification scan and generate a report; a modal displays.

  5. Select a data source.

  6. Select a classification method.

    Note

    The dropdown displays classification methods available based on the data source selected. Learn more.

  7. Select a collection (ALTR Native only).

  8. Click Classify Data. This process may take a few minutes depending on the size of the data source. When the Status is Success, the classification report is available.

Note

Only ALTR Native classification scans can be paused.

Pause a classification scan to temporary stop it, for example, if there are performance issues. You cannot view the report of a partially run classification scan; the scan must successfully complete in order to view the report.

If you update the collection used on a paused classification scan, the final report does not include the changes; it is a snapshot of the collection from when the classification started.

To pause a running classification scan:

  1. Select Data ClassificationClassification Reports in the Navigation menu.

  2. Locate the ALTR Native classification scan to pause (Status is In Progress).

  3. Click the ellipsis menu for the running scan.

  4. Select Pause scan.

To resume a paused classification scan:

  1. Select Data ClassificationClassification Reports in the Navigation menu.

  2. Locate the ALTR Native classification scan to resume (Status is In Progress).

  3. Click the ellipsis menu for the paused scan.

  4. Select Resume scan.

Note

Only ALTR Native classification scans can be cancelled.

Cancel a classification to permanently stop it. You will lose any data generated for the report. Cancelling a scan is permanent and can’t be restarted. To temporarily stop a running scan, pause it instead. A cancelled classification report remains on the Classification Report page with a Status of Cancelled.

To cancel a running classification scan:

  1. Select Data ClassificationClassification Reports in the Navigation menu.

  2. Locate the ALTR Native classification scan to cancel (Status is In Progress or Paused).

  3. Click the ellipsis menu for the scan.

  4. Select Cancel scan; a modal displays to confirm.

  5. Click Cancel Scan.

Once the data classification scan completes and the report is generated, view the classification report. The classification report is available once the Status on the Classification Report page is Success.

Note

For Snowflake Classification and Google DLP Classification, administrators in your ALTR organization receive an email when the classification completes.

To view the classification report:

  1. Select Data ClassificationClassification Reports in the Navigation menu.

  2. Click a report to view details.

Note

If viewing an ALTR Native classification report and a column doesn’t display in the report as expected, adjust the Classification Settings for the classifier and re-classify the data.

Classification results in ALTR help you identify where sensitive data exists so you can apply data access control policies to protect this sensitive data.

When a classification scan completes, ALTR generates a classification report that shows which columns were identified as containing specific types of data, such as personal, financial or regulated information. You can use these results to create or update policies that protect sensitive data across your environment.

To protect sensitive sensitive data using classification:

  1. Identify sensitive data. Classification reports highlight columns that contain sensitive data based on the classifiers used in the scan. Assign tags to columns based on classification results using automatic tagging (Snowflake classification and Google DLP classification only).

  2. Apply access controls. Based on classification, create policies to

    1. mask sensitive data

    2. restrict access by user or role

    3. allow access only under defined conditions

  3. Automate ongoing protection. As new data is classified or existing data changes, policies continue to apply automatically, helping maintain consistent protection without manual updates.

    Example

    If a classification scan identifies columns containing Social Security numbers, you can create a policy that masks all SSN-classified columns for non-privileged users. As additional tables or columns are later classified as containing SSNs, the policy applies automatically.