ALTR's Snowflake Integration
This section documents how ALTR connects to Snowflake, including both ALTR's use of a Snowflake Service user as well as the API Integration between ALTR and Snowflake.
Snowflake Integration Overview
ALTR uses two different mechanisms to connect with Snowflake: a Service User and an API Integration. ALTR's Snowflake Service user is a login to customer Snowflake Accounts that enables ALTR to create and manage objects within Snowflake, such as Dynamic Data Masking Policies, Row Access Policies, and External Functions. The API integration is used for Snowflake to communicate with ALTR's cloud-based access control and security engine to make access decisions, perform tokenization, and generate near-real-time audit logs.
Unlike other Data Security Platforms, ALTR is a true cloud to cloud SaaS solution. There is no proxy to install between Snowflake and data users and there is no maintenance of custom views required to control data access. Instead, ALTR's cloud data security platform connects with Snowflake over API, which enables an ease of integration and minimization of latency.
Data source connections
When customers connect a Snowflake Data Source to ALTR, ALTR uses the specified service user to create the Snowflake objects necessary for data access governance and advanced data security. This includes:
A schema, ALTR_DSAAS, in the indicated database. ALTR houses all schema objects in the ALTR_DSAAS schema to minimize interference with existing customer schemas.
An API integration, which Snowflake will use to communicate with ALTR's SaaS platform for functionality like making access decisions and performing detokenization.
A Snowflake Object Tag, ALTR_GOVERNED, which is used to indicate which Snowflake Objects ALTR is able to enforce access governance on and can be useful in troubleshooting.
After connecting a Snowflake data source, ALTR kicks off two recurring processes to maintain the connection: A Schema Snapshot and Database Status Check. The Schema Snapshot identifies and caches all of the schemas, tables, views, and columns within the connected database so that they can be quickly referenced when using ALTR's UI. The Database Status check ensures that ALTR's service user can still connect to the database and still has the necessary privileges to perform data access governance. It also tags all the databases and warehouses visible to ALTR's Service User with the ALTR_GOVERNED tag. The Schema Snapshot recurs once every three days. The database status check recurs every day. Users can manually trigger a new database status check in the Data Sources page in ALTR. Neither of these processes activate Snowflake warehouses.
Note
If your Snowflake account does not have a PC_ALTR_DB database, ALTR's service user will will not be able to tag Snowflake databases and warehouse. You can fix this by manually creating a PC_ALTR_DB database and updating ALTR's Service User's privileges.
User Group Creation
ALTR automatically creates user groups based on Snowflake Roles. When first connecting a Snowflake data source to ALTR, ALTR will ingest up to 2,500 roles available in Snowflake and create corresponding User Groups in ALTR. After connection, ALTR will generate new user groups as it encounters new roles in access-controlled queries. ALTR will not import any new roles until that role is used to query connected data.
If you have over 2500 roles in a Snowflake account, reach out to ALTR support for help importing the additional roles into ALTR.
Column and Tag Connections
When an ALTR user connects a Snowflake Column or Object Tag to ALTR, ALTR creates certain Snowflake objects required to enforce data access governance on that column or tag including dynamic data masking policies, user defined functions, and external functions.
When a column is connected to ALTR, ALTR creates a dynamic data masking policy, user defined function, and external function for that column in the ALTR_DSAAS schema relating to that column's database. These objects are normally removed when a column is disconnected from ALTR. If a user force disconnects a column from ALTR, they may need to manually remove these Snowflake objects.
Connecting a Snowflake Object Tag to ALTR automatically enforces ALTR access governance to all columns assigned to that tag. When a Snowflake Object Tag is connected to ALTR, ALTR creates a dynamic data masking policy for each Snowflake data type protected by that tag. ALTR also creates a user defined function and external function that are invoked by the dynamic data masking policies. Because Snowflake Object Tags can be applied account-wide, ALTR consolidates these objects in a centralized database, ALTR_DSAAS_DB. These objects are normally removed from Snowflake when a tag is disconnected from ALTR. If a user force disconnects a tag from ALTR, they may need to manually remove these Snowflake objects.
When a Snowflake Object Tag is connected, ALTR caches a list of all allowed values for that Snowflake Object Tag for use in Column Access Policies. Query Audit Logs against columns controlled by connected Snowflake Object Tags include detail on which specific tag(s) were accessed by users, instead of columns. This information is available in the Query Audit Log AWS S3 Export.
Snowflake Object Tags can be connected via ALTR's API and on the Tags page in ALTR. Refer to Tag Masking API and Connecting Tags, respectively, for more information.
Row Access Policy Connections
When ALTR users define Row Access policies for a Snowflake table or view, ALTR will create a Snowflake Row Access Policy object in the ALTR_DSAAS schema corresponding to that table's database.
Data Classification
When performing a Google DLP data classification. ALTR will select a small random sample of data from each column within the indicated database. Columns within a table are sampled separately to minimize the risk of identifying individual records. ALTR does not persist this data. ALTR does not perform any data sampling unless a user explicit requests it via a Google DLP classification.
When performing a Snowflake classification, ALTR executes a command in Snowflake to trigger Snowflake's native classification function, and the results are stored as Snowflake Object Tags. ALTR then queries Snowflake's object tag assignments to identify which columns were assigned classifications. ALTR does not sample client data when performing a Snowflake classification.
Both forms of data classification will activate the Service User's warehouse. The length of time the warehouse is active depends on the warehouse size and amount of data being classified.
Data Usage History Import
When performing a Data Usage History Import, ALTR queries Snowflake's Access History view for the indicated database. This occurs for the last thirty days of activity when configured, as well as an incremental daily update. This process activates the service user's warehouse. The length of time the warehouse is active depends on the size of the warehouse and the amont of data in the view.
Real-Time Policy Enforcement and Detokenization
When Snowflake users query data that is controlled by ALTR, Snowflake triggers ALTR's governance API through an external function and API integration. Snowflake sends ALTR information on the column or tag accessed, the user and role executing the query, and information on the query itself. ALTR returns a governance decision, or "disposition", which instructs snowflake on which masking strategy to implement.
When using ALTR's integration of detokenization with Snowflake, Snowflake also uses the API integration to call ALTR's detokenization API. This automatically occurs after the governance decision above so users will only be able to access tokenized data if they are granted access in ALTR's policies.
Audit Logs Generation
ALTR's Query Audit Log is populated both from information provided by Snowflake when making a governance decision (see above) as well as from information ALTR queries against Snowflake's Query History view. ALTR will not be able to generate a Query Audit Log for a query until after a record for that query is generated in Snowflake's Query History View. Additionally, ALTR will only be able to access that Query History record if ALTR's Service User has MONITOR access on the Snowflake Warehouse used to query the data. This process activates the Service User's warehouse. The duration of warehouse activation is typically a small number of seconds but the frequency of warehouse activation depends on query volume. ALTR recommends minimizing the warehouses's auto suspension timing. This is automatically minimized when executing ALTR's Snowflake service user stored procedure.
ALTR's Custom Audit Log is populated by Snowflake making calls via the Snowflake API Integration. This API call itself does not activate a Snowflake warehouse, but any custom tasks or functions to make the call may activate a warehouse.
Tagging Snowflake Databases and Warehouses
ALTR tags Snowflake Databases and Warehouses to easily indicate which objects are protected by ALTR's data security platform. This enables customers to easily identify any gaps in coverage and triage any support issues.
What are Snowflake Object Tags?
Snowflake Object Tags are metadata that can be set on any Snowflake object such as a database, column, table, warehouse, user, or role. Object tags are key-value pairs which Snowflake calls the “Tag Name” and “Tag Value”. Object tags have a variety of uses, logically grouping similar items such as business units, centers, or data classification.
How ALTR Tags Databases and Warehouses
ALTR creates a custom Snowflake Object Tag, 'ALTR_GOVERNED' in PC_ALTR_DB the when a data source connection check is run. This occurs automatically when a data sources is first connected and recurs every 24 hours. If the tag does not exist, it will be recreated and reassigned to all databases and warehouses visible to ALTR's service user. If PC_ALTR_DB does not exist, the tag is not created.
PC_ALTR_DB is automatically created if you signed up for ALTR from Snowflake Partner Connect. If PC_ALTR_DB does not exist in your Snowflake account, ALTR recommends manually creating it to ensure proper tagging. To do so, create PC_ALTR_DB, execute ALTR's Stored Procedure to ensure the Service User can access the database, and manually trigger a data source status check to for ALTR to create the tag.
If the ALTR_GOVERNED tag is manually removed but PC_ALTR_DB still exists, then ALTR_GOVERNED will be automatically recreated and tagged during the next connection check.
Schema Snapshot System
When you connect a Snowflake database to ALTR, ATLR caches information on objects in that database in a schema snapshot in order to quickly identify data to be classified and connected in ALTR. After the initial schema snapshot is taken, it is refreshed every 72 hours.
Note
If you need the schema snapshot refreshed sooner than 72 hours, refer to Frequency for more information and recommendations.
ALTR takes a schema snapshot of the data hierarchy, specifically
databases
schemas
tables and views
columns
Uses
ALTR uses the schema snapshot in two ways:
Data classification—during the data classification process, ALTR identifies which kinds of data are present in a column. ALTR users the schema snapshot to determine which columns are present within a database for classification.
Column-based policy—when connecting a column in ALTR, you select the data source, table, column, etc, which are all populated from the schema snapshot.
Note
In column-based policy, ALTR ignores the INFORMATION schema because it is managed by Snowflake and ALTR cannot update it. For that reason, the only schema available in ALTR is PUBLIC.
Frequency
The schema snapshot is taken when the database is connected to ALTR. After that, the snapshot is refreshed every 72 hours. This means that when changes are made in Snowflake, such as when a new table is added, it could take up to 72 hours for the change to appear in ALTR.
If you make frequent changes to Snowflake, you may need the schema snapshot to run sooner than 72 hours. Here are three solutions in the order ALTR recommends:
Use tag-based policy. Tag-based policy assigns a masking policy to any column in Snowflake that has that tag without having to connect to the database. For example, with this method, you can assign masking policy to a new table even though the table is not in the schema snapshot. Masking rules are applied by assigning tags in Snowflake instead of indicating the columns in ALTR. Use ALTR to indicate the tags you need, set your policy in ALTR on the tag, then assign the tag to any column in Snowflake.
Disconnect and reconnect the database. Since the schema snapshot is initially taken when the database is connected to ALTR, disconnect and reconnect the database to trigger a snapshot. Before you can disconnect the database, you must disconnect all the columns from ALTR. When you reconnect the database, you must reconnect all the columns to ALTR. This solution is only ideal if you have a small number of columns connected to ALTR. Also note that while you are in the process of disconnecting and reconnecting the database, your data is not being masked.
Contact ALTR Support. If neither of the previous solutions are suitable for your use case, contact ALTR Support.