...

Tag-Based Masking in Snowflake: Practical Guide with Scalable Implementation

Thu Huynh
June 11, 2024
Read: 5 min

As data continues to be a critical asset for organisations across industries, safeguarding sensitive information while enabling data access for authorised users is a constant balancing act. One effective approach to protect sensitive data is through tag-based masking in Snowflake, a method that allows organisations to control access to data based on predefined tags or labels.

In this article, I am going to delve into the concept of tag-based masking in Snowflake, a powerful approach that leverages object tagging in conjunction with masking policies to improve data governance. I am also going to introduce you to dbt-tags, a package developed by Infinite Lambda, which helps streamline the implementation process.

Object tagging and tag-based masking for scalable data governance

Snowflake's object tagging feature allows you to associate key-value pairs with various objects within your data warehouse, including databases, schemas, tables and even columns. These tags act as metadata, providing additional context and facilitating organisation. In the context of data governance, object tagging is a useful method for categorising sensitive data types.

Tag-based masking, a form of Attribute-Based Access Control (ABAC), is a data security technique that restricts access to sensitive data based on its attributes or tags. In ABAC, access is granted by evaluating various attributes associated with the user, the data, and the context of the access request, allowing for more dynamic and fine-grained access control compared to traditional Role-Based Access Control (RBAC), which grants access primarily based on predefined user roles or groups.

Tag-based masking further refines this approach. It specifically uses tags or metadata associated with the data (e.g. data classification, sensitivity level, or business unit) to determine access permissions. This way, it enables even more precise control over data access, ensuring that only authorised users with the appropriate attributes can view or interact with sensitive information.

How tag-based masking in Snowflake works

Leveraging tags for effective masking policies

Snowflake's masking policies offer a mechanism to transform data before querying, ensuring sensitive information remains masked for unauthorised users.

There are two approaches you can choose between: dynamic masking and tag-based masking. Dynamic masking allows you to apply masking policies on column-level directly. Tag-based masking, on the other hand, offers greater flexibility and scalability because by tagging columns, you can manage and update masking rules centrally. This comes in handy when you are dealing with a large number of columns and simplifies the process when you have evolving requirements.

Moreover, tags ensure consistency across the database, making it easier to audit and maintain security. They help in categorising data, allowing you to apply different policies based on data sensitivity. Finally, they save time and reduces the risk of human error, especially in large, dynamic datasets.

How to implement tag-based masking

By combining object tagging with masking policies, we unlock a powerful approach to data security management with several steps:

  1. Define custom tags: Create tags that categorise sensitive data types (e.g. PII, financial, healthcare). These tags serve as a classification system for your sensitive data.
  2. Develop masking policies: Outline the rules for data manipulation. Policies can be generic or specific, depending on the desired level of anonymisation (e.g. redact all but the last four digits of a Social Security Number or replace email addresses with a generic format).
  3. Attach masking policies to tags: Associate masking policies to relevant tags. This is a one-off action and allows changes to masking policies to automatically take effect on corresponding tags without re-assigning.
  4. Assign tags to columns / tables: Apply the relevant tag to columns that contain the corresponding sensitive data type. This establishes a clear link between the data and the masking logic it requires.

A real-world example

Let's take a customer table from a real project as an example to illustrate the process of implementing tag-based masking in Snowflake.

Step 1: Create a custom tag

These scripts define two tags named pii_email and pii_phone to categorise Personally Identifiable Information (PII) data.

Step 2: Develop masking policies

These scripts create two masking policies:

  • mask_email: Replaces everything before the "@" symbol in email addresses with asterisks (***);
  • mask_phone_number: Replaces the last four digits in phone numbers with asterisks.

Step 3: Associate a tag with policy

These scripts associate the mask_email and mask_phone_number policies with the corresponding tags.

Step 4: Apply tag to customer table columns

These scripts modify the customer table, adding tags to both the email and phone_number columns. Since the masking policies are attached to the tags, this effectively applies the associated masking policies to these columns.

Step 5: Verify masking

Run this query to verify that the masking policies are applied correctly. You should see the masked versions of email addresses and phone numbers in the corresponding columns.

Tag-based masking in Snowflake: results table

Challenges in implementing tag-based masking in Snowflake at scale

While the concept of tag-based masking in Snowflake offers significant benefits in terms of data security and access control, implementing it effectively can be challenging, particularly in large and complex data environments.

Some of the key challenges include:

  • Data classification consistency: Ensuring consistent and accurate classification of data assets based on their sensitivity level or business context might be difficult;
  • Tag management complexity: Managing and maintaining tags or labels across a large number of data assets and data models can become complex and prone to errors;
  • Scalability and flexibility: As data volumes and complexity continue to grow, you will inevitably need scalable and flexible solutions that can adapt to evolving business needs and data requirements;
  • Integration with access control: Integrating tag-based masking with existing access control mechanisms, such as RBAC or external identity providers, can be challenging regardless of project scale;
  • Performance impact: Applying masking rules to large datasets during query execution inevitably comes with performance overhead, and minimising the impact is an important consideration;
  • User education and training: Successful implementation depends on users’ and stakeholders’ understanding the importance of tag-based masking as well as their roles and responsibilities when it comes to adhering to tagging policies.

Using dbt_tags to simplify tag-based masking workflows

To address some of the challenges above and seamlessly implement tag-based masking in Snowflake, you can use the dbt-tags package. Designed specifically for Snowflake users on the initial version, it leverages the power of dbt to streamline the process of tagging and masking within the Data Cloud at scale.

Key features:

  • Centralised tag management: dbt-tags enables users to manage and maintain tags centrally within their dbt projects by using version control systems. This way, it fosters transparency and traceability for the tagging decisions;
  • Centralised masking policy management: You can also manage masking policies in one place within the dbt project, just like you do with tags;
  • Execution and testing automation:
    • Tags are configured as simple as dbt model metadata;
    • Attaching masking policies to tags and applying tags to columns is easy and efficient via dbt hooks;
    • Undoing the implementation (aka dropping / unapplying) is no fuss either and does not require manual actions;
    • Warnings are raised if any step of the implementation process is not followed accurately.

Here is a visual walk-through.

 

Final word

As organisations continue to navigate the complexities of data management, tag-based masking in Snowflake is not just an option but a strategic approach in modern data governance practices.

In this article, we explored the foundational concepts of tag-based masking, delved into practical implementations, and highlighted the challenges that organisations may encounter when deploying such strategies at scale. We also looked into the dbt-tags package, which aims to optimise the implementation process and offer an elegant approach to managing tag-based masking workflows efficiently.

Further reading

  1. Protect Your Sensitive Data Better than Ever with Tag-Based Masking on the Snowflake blog;
  2. Snowflake Tags for Data Masking and Governance on ServiceNow.

 

Explore the Infinite Lambda Blog for other insightful articles that will help you on your data enablement journey.

Make sure to get in touch if you are ready to talk about your project and the challenges you are facing.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Cloud sustainability
Cloud Sustainability
This article on cloud sustainability is a part of a series on carbon analytics published on the Infinite Lambda Blog. Appreciating the cloud is becoming...
June 5, 2024
How to measure happiness and safety in tech teams
How to Measure Happiness and Safety in Tech Teams
Software product development initiatives can go awry for a whole range of reasons. However, the main ones tend not to be technical at all. Rather,...
May 30, 2024
why sustainability analytics
Why Sustainability Analytics
We all like a sunny day. Kicking back in the garden with the shades on, cool drink in hand and hopefully a liberal amount of...
May 8, 2024
Data diff validation in a blue green deployment: how to guide
Data Diff Validation in Blue-Green Deployments
During a blue-green deployment, there are discrepancies between environments that we need to address to ensure data integrity. This calls for an effective data diff...
January 31, 2024
GDPR & Data Governance in Tech
GDPR & Data Governance in Tech
The increasing focus on data protection and privacy in the digital age is a response to the rapid advancements in technology and the widespread collection,...
January 18, 2024
Data masking on Snowflake using data contracts
Automated Data Masking on Snowflake Using Data Contracts
As digital data is growing exponentially, safeguarding sensitive information is more important than ever. Compliance with strict regulatory frameworks, such as the European Union’s General...
January 17, 2024

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.