Skip to main content

ALTR's Snowflake Integration

This section documents how ALTR connects to Snowflake, including both ALTR's use of a Snowflake Service user as well as the API Integration between ALTR and Snowflake.

Snowflake Integration Overview

ALTR uses two different mechanisms to connect with Snowflake: a Service User and an API Integration. ALTR's Snowflake Service user is a login to customer Snowflake Accounts that enables ALTR to create and manage objects within Snowflake, such as Dynamic Data Masking Policies, Row Access Policies, and External Functions. The API integration is used for Snowflake to communicate with ALTR's cloud-based access control and security engine to make access decisions, perform tokenization, and generate near-real-time audit logs.

Unlike other Data Security Platforms, ALTR is a true cloud to cloud SaaS solution. There is no proxy to install between Snowflake and data users and there is no maintenance of custom views required to control data access. Instead, ALTR's cloud data security platform connects with Snowflake over API, which enables an ease of integration and minimization of latency.

Data source connections

When customers connect a Snowflake Data Source to ALTR, ALTR uses the specified service user to create the Snowflake objects necessary for data access governance and advanced data security. This includes:

  • A schema, ALTR_DSAAS, in the indicated database. ALTR houses all schema objects in the ALTR_DSAAS schema to minimize interference with existing customer schemas.

  • An API integration, which Snowflake will use to communicate with ALTR's SaaS platform for functionality like making access decisions and performing detokenization.

  • A Snowflake Object Tag, ALTR_GOVERNED, which is used to indicate which Snowflake Objects ALTR is able to enforce access governance on and can be useful in troubleshooting.

After connecting a Snowflake data source, ALTR kicks off two recurring processes to maintain the connection: A Schema Snapshot and Database Status Check. The Schema Snapshot identifies and caches all of the schemas, tables, views, and columns within the connected database so that they can be quickly referenced when using ALTR's UI. The Database Status check ensures that ALTR's service user can still connect to the database and still has the necessary privileges to perform data access governance. It also tags all the databases and warehouses visible to ALTR's Service User with the ALTR_GOVERNED tag. The Schema Snapshot recurs once every three days. The database status check recurs every day. Users can manually trigger a new database status check in the Data Sources page in ALTR. Neither of these processes activate Snowflake warehouses.

Note

If your Snowflake account does not have a PC_ALTR_DB database, ALTR's service user will will not be able to tag Snowflake databases and warehouse. You can fix this by manually creating a PC_ALTR_DB database and updating ALTR's Service User's privileges.

User Group Creation

ALTR automatically creates user groups based on Snowflake Roles. When first connecting a Snowflake data source to ALTR, ALTR will ingest up to 2,500 roles available in Snowflake and create corresponding User Groups in ALTR. After connection, ALTR will generate new user groups as it encounters new roles in access-controlled queries. ALTR will not import any new roles until that role is used to query connected data.

If you have over 2500 roles in a Snowflake account, reach out to ALTR support for help importing the additional roles into ALTR.

To view user groups, select Data ConfigurationUser Groupsin the Navigation menu.

Column and Tag Connections

When an ALTR user connects a Snowflake Column or Object Tag to ALTR, ALTR creates certain Snowflake objects required to enforce data access governance on that column or tag including dynamic data masking policies, user defined functions, and external functions.

When a column is connected to ALTR, ALTR creates a dynamic data masking policy, user defined function, and external function for that column in the ALTR_DSAAS schema relating to that column's database. These objects are normally removed when a column is disconnected from ALTR. If a user force disconnects a column from ALTR, they may need to manually remove these Snowflake objects.

Connecting a Snowflake Object Tag to ALTR automatically enforces ALTR access governance to all columns assigned to that tag. When a Snowflake Object Tag is connected to ALTR, ALTR creates a dynamic data masking policy for each Snowflake data type protected by that tag. ALTR also creates a user defined function and external function that are invoked by the dynamic data masking policies. Because Snowflake Object Tags can be applied account-wide, ALTR consolidates these objects in a centralized database, ALTR_DSAAS_DB. These objects are normally removed from Snowflake when a tag is disconnected from ALTR. If a user force disconnects a tag from ALTR, they may need to manually remove these Snowflake objects.

When a Snowflake Object Tag is connected, ALTR caches a list of all allowed values for that Snowflake Object Tag for use in Column Access Policies. Query Audit Logs against columns controlled by connected Snowflake Object Tags include detail on which specific tag(s) were accessed by users, instead of columns. This information is available in the Query Audit Log AWS S3 Export.

Snowflake Object Tags can be connected via ALTR's API and on the Tags page in ALTR. Refer to Tag Masking API and Connecting Tags, respectively, for more information.

Row Access Policy Connections

When ALTR users define Row Access policies for a Snowflake table or view, ALTR will create a Snowflake Row Access Policy object in the ALTR_DSAAS schema corresponding to that table's database.

Data Classification

When performing a Google DLP data classification. ALTR will select a small random sample of data from each column within the indicated database. Columns within a table are sampled separately to minimize the risk of identifying individual records. ALTR does not persist this data. ALTR does not perform any data sampling unless a user explicit requests it via a Google DLP classification.

When performing a Snowflake classification, ALTR executes a command in Snowflake to trigger Snowflake's native classification function, and the results are stored as Snowflake Object Tags. ALTR then queries Snowflake's object tag assignments to identify which columns were assigned classifications. ALTR does not sample client data when performing a Snowflake classification.

From the classification report, you can optionally assign Snowflake object tags to columns. Learn more about automatic tagging.

Both forms of data classification will activate the Service User's warehouse. The length of time the warehouse is active depends on the warehouse size and amount of data being classified.

Data Usage History Import

When performing a Data Usage History Import, ALTR queries Snowflake's Access History view for the indicated database. This occurs for the last thirty days of activity when configured, as well as an incremental daily update. This process activates the service user's warehouse. The length of time the warehouse is active depends on the size of the warehouse and the amont of data in the view.

Real-Time Policy Enforcement and Detokenization

When Snowflake users query data that is controlled by ALTR, Snowflake triggers ALTR's governance API through an external function and API integration. Snowflake sends ALTR information on the column or tag accessed, the user and role executing the query, and information on the query itself. ALTR returns a governance decision, or "disposition", which instructs snowflake on which masking strategy to implement.

When using ALTR's integration of detokenization with Snowflake, Snowflake also uses the API integration to call ALTR's detokenization API. This automatically occurs after the governance decision above so users will only be able to access tokenized data if they are granted access in ALTR's policies.

Audit Logs Generation

ALTR's Query Audit Log is populated both from information provided by Snowflake when making a governance decision (see above) as well as from information ALTR queries against Snowflake's Query History view. ALTR will not be able to generate a Query Audit Log for a query until after a record for that query is generated in Snowflake's Query History View. Additionally, ALTR will only be able to access that Query History record if ALTR's Service User has MONITOR access on the Snowflake Warehouse used to query the data. This process activates the Service User's warehouse. The duration of warehouse activation is typically a small number of seconds but the frequency of warehouse activation depends on query volume. ALTR recommends minimizing the warehouses's auto suspension timing. This is automatically minimized when executing ALTR's Snowflake service user stored procedure.

ALTR's Custom Audit Log is populated by Snowflake making calls via the Snowflake API Integration. This API call itself does not activate a Snowflake warehouse, but any custom tasks or functions to make the call may activate a warehouse.

Tagging Snowflake Databases and Warehouses

ALTR tags Snowflake Databases and Warehouses to easily indicate which objects are protected by ALTR's data security platform. This enables customers to easily identify any gaps in coverage and triage any support issues.

What are Snowflake Object Tags?

Snowflake Object Tags are metadata that can be set on any Snowflake object such as a database, column, table, warehouse, user, or role. Object tags are key-value pairs which Snowflake calls the “Tag Name” and “Tag Value”. Object tags have a variety of uses, logically grouping similar items such as business units, centers, or data classification.

How ALTR Tags Databases and Warehouses

ALTR creates a custom Snowflake Object Tag, 'ALTR_GOVERNED' in PC_ALTR_DB the when a data source connection check is run. This occurs automatically when a data sources is first connected and recurs every 24 hours. If the tag does not exist, it will be recreated and reassigned to all databases and warehouses visible to ALTR's service user. If PC_ALTR_DB does not exist, the tag is not created.

PC_ALTR_DB is automatically created if you signed up for ALTR from Snowflake Partner Connect. If PC_ALTR_DB does not exist in your Snowflake account, ALTR recommends manually creating it to ensure proper tagging. To do so, create PC_ALTR_DB, execute ALTR's Stored Procedure to ensure the Service User can access the database, and manually trigger a data source status check to for ALTR to create the tag.

If the ALTR_GOVERNED tag is manually removed but PC_ALTR_DB still exists, then ALTR_GOVERNED will be automatically recreated and tagged during the next connection check.

Schema Snapshot System

When you connect a Snowflake database to ALTR, ATLR caches information on objects in that database in a schema snapshot in order to quickly identify data to be classified and connected in ALTR. After the initial schema snapshot is taken, it is refreshed every 72 hours.

ALTR takes a schema snapshot of the data hierarchy, specifically

  • databases

  • schemas

  • tables and views

  • columns