Data Classification
Data classification in ALTR helps you identify potentially sensitive data across data sources. This process determines where specific types of information, such as personal, financial or regulated data, may exist.
After a classification scan completes, the results are available in a classification report. Use these results to understand how sensitive data is distributed across your data sources and to protect sensitive data by applying access controls.
ALTR supports three classification methods:
ALTR Native classification—uses regex-based classifiers you define or that are managed by ALTR
Snowflake classification—leverages Snowflake’s native classification capabilities
Google DLP classification—uses Google’s Data Loss Prevention service
Refer to the follow table for a high-level comparison of the available ALTR’s classification methods:
Method | Description | Supported Data Sources | Available Features | Limitations |
|---|---|---|---|---|
Matches custom or ALTR-managed classifier patterns against a data sample | Snowflake |
| Does not support auto tagging | |
Uses Snowflake’s built-in classification functionality | Snowflake |
|
| |
Matches pre-defined rules against a data sample using Google Data Loss Prevention |
|
|
|
Data classification may take several minutes to run based on the number of columns in the data source. An email is sent to administrators when the classification report is ready (Snowflake and Google DLP, only).
Note
Classifying data in a Snowflake data source using ALTR Native, Google DLP or Snowflake Classification activates a Snowflake warehouse. The length of time this warehouse is active depends on the number of columns in the data source.
Classifying data in a Databricks metastore using Google DLP activates a Databricks compute. The length of time this compute is active depends on the number of columns in the metastore. Currently, the only classification method supported for Databricks is Google DLP.
Classifying data runs a classification scan and generates a classification report.
To classify data:
Connect a Snowflake or a Databricks data source to ALTR.
If using ALTR Native Classification, create a collection with at least one classifier. Learn more.
Select → in the Navigation menu.
Click Classify Data to run a classification scan and generate a report; a modal displays.
Select a data source.
Select a classification method.
Note
The dropdown displays classification methods available based on the data source selected. Learn more.
Select a collection (ALTR Native only).
Click Classify Data. This process may take a few minutes depending on the size of the data source. When the Status is Success, the classification report is available.
Note
Only ALTR Native classification scans can be paused.
Pause a classification scan to temporary stop it, for example, if there are performance issues. You cannot view the report of a partially run classification scan; the scan must successfully complete in order to view the report.
If you update the collection used on a paused classification scan, the final report does not include the changes; it is a snapshot of the collection from when the classification started.
To pause a running classification scan:
Select → in the Navigation menu.
Locate the ALTR Native classification scan to pause (Status is In Progress).
Click the ellipsis menu for the running scan.
Select Pause scan.
To resume a paused classification scan:
Select → in the Navigation menu.
Locate the ALTR Native classification scan to resume (Status is In Progress).
Click the ellipsis menu for the paused scan.
Select Resume scan.
Note
Only ALTR Native classification scans can be cancelled.
Cancel a classification to permanently stop it. You will lose any data generated for the report. Cancelling a scan is permanent and can’t be restarted. To temporarily stop a running scan, pause it instead. A cancelled classification report remains on the Classification Report page with a Status of Cancelled.
To cancel a running classification scan:
Select → in the Navigation menu.
Locate the ALTR Native classification scan to cancel (Status is In Progress or Paused).
Click the ellipsis menu for the scan.
Select Cancel scan; a modal displays to confirm.
Click Cancel Scan.
Once the data classification scan completes and the report is generated, view the classification report. The classification report is available once the Status on the Classification Report page is Success.
Note
For Snowflake Classification and Google DLP Classification, administrators in your ALTR organization receive an email when the classification completes.
To view the classification report:
Select → in the Navigation menu.
Click a report to view details.
Note
If viewing an ALTR Native classification report and a column doesn’t display in the report as expected, adjust the Classification Settings for the classifier and re-classify the data.
Classification results in ALTR help you identify where sensitive data exists so you can apply data access control policies to protect this sensitive data.
When a classification scan completes, ALTR generates a classification report that shows which columns were identified as containing specific types of data, such as personal, financial or regulated information. You can use these results to create or update policies that protect sensitive data across your environment.
To protect sensitive sensitive data using classification:
Identify sensitive data. Classification reports highlight columns that contain sensitive data based on the classifiers used in the scan. Assign tags to columns based on classification results using automatic tagging (Snowflake classification and Google DLP classification only).
Apply access controls. Based on classification, create policies to
mask sensitive data
restrict access by user or role
allow access only under defined conditions
Automate ongoing protection. As new data is classified or existing data changes, policies continue to apply automatically, helping maintain consistent protection without manual updates.
Example
If a classification scan identifies columns containing Social Security numbers, you can create a policy that masks all SSN-classified columns for non-privileged users. As additional tables or columns are later classified as containing SSNs, the policy applies automatically.