Skip to main content

Classify Data

ALTR provides two ways to classify data:

  • Google Data Loss Prevention (DLP)

  • Snowflake Classification

These integrations allow you to scan data sources that are connected to ALTR and to identify what data may be sensitive. This data can then be used to automate access controls and security policies.

Note

If you are concerned about resource usage and speed, Google DLP may be the best option; but if your data cannot leave your Snowflake instance, then Snowflake Native is the best choice.

Google Data Loss Prevention (DLP) Classification

ALTR's integration with Google's DLP classification tool randomly samples columnar data from a connected data source and classifies it using Google's DLP API. This API may return an "infotype" indicating what kinds of data may be present in the sample. ALTR provides this information back to users in a Classification report.

The classification report not only shows you what data might be sensitive, but you can also use the results to automatically assign Snowflake object tags to columns. Learn more about automatic tagging.

ALTR does not sample customer data or send it to the Google DLP tool without explicit instruction from an ALTR administrator. ALTR randomly selects samples of data from each column to prevent row identification and does not persist or log the sample of data sent to Google’s DLP service.

To obtain the database sample for classification, ALTR:

  1. Counts the number of columns in the table and multiplies it by 2.

  2. Randomly selects [(column count) x 2] rows from the table. For example, if there are 10 columns, ALTR selects 20 rows. This data set becomes Sample 1.

    GoogleDLP1.jpg
  3. Uses Sample 1 to create a Sample 2 that is half the number of rows as Sample 1.

  4. Starting with the first column, ALTR randomly selects 10 values from Sample 1, and places them in rows 1 through 10 of the corresponding column in Sample 2. This process is repeated for each column. This is the sample that is sent to Google DLP.

    GoogleDLP2.jpg

Sample 2 scrambles the table without changing the type of data in each column. Any particular column value may or may not be related to the other values in the same row.

Classification may take several minutes to run depending on the number of columns in the data source. An email is sent to administrators once the classification is complete.

Performing a Google DLP classification on Snowflake data sources activates a Snowflake warehouse. The length of time this warehouse is active depends on the number of columns present in the data source.

To run a Google DLP classification:

  1. Select Data ConfigurationData Sources in the Navigation menu.

  2. Create a data source or edit an existing data source.

  3. Select the Tag Data by Classification check box.

  4. Select Google DLP Classification from the Tag Type list box.

  5. Click Save Changes.

Once administrators received an email that the classification has completed, view the classification report in ALTR.

To view the classification report:

  1. Select Data ConfigurationData Management in the Navigation menu.

  2. Click the Classification Report tab.

Snowflake Classification

ALTR's integration with Snowflake classification enables customers with connected Snowflake data sources to classify columnar data without sampling or sending data to third parties. This option is useful when customers do not want ALTR to sample data or send it over Google's DLP API. When a Snowflake classification is completed, the resulting Semantic Categories are assigned to relevant columns within Snowflake as object tags.

The classification report not only shows you what data might be sensitive, but you can also use the results to automatically assign Snowflake object tags to columns. Learn more about automatic tagging.

Classification may take several minutes to run depending on the number of columns in the data source. An email is sent to administrators once the classification is complete.

Performing a Snowflake classification on Snowflake data sources activates a warehouse. The length of time this warehouse is active depends on the number of columns present in the data source.

To run a Snowflake classification:

  1. Select Data ConfigurationData Sources in the Navigation menu.

  2. Create a data source or edit an existing data source.

  3. Select the Tag Data by Classification check box.

  4. Select Snowflake Classification and Object Tag Import from the Tag Type list box.

  5. Click Save Changes.

Once administrators received an email that the classification has completed, view the classification report in ALTR.

To view the classification report:

  1. Select Data ConfigurationData Management in the Navigation menu.

  2. Click the Classification Report tab.