Product and Technology

How to Configure a Snowflake Account to Prevent Data Exfiltration

How to Configure a Snowflake Account to Prevent Data Exfiltration

This blog post is a follow-up of the blog post in December 2019 which introduced credential-less stages functionality in Snowflake. In this blog post, we will see how credential-less stages can be used to prevent data exfiltration from Snowflake.

Snowflake Cloud Data Platform enables customers to store their business-critical and sensitive data for analytics. Data for the Snowflake tables is stored in Snowflake’s internal storage. Often customers export this data out of Snowflake in the form of CSV, JSON, or Parquet files. The data can be exported using a COPY INTO <location> command.

COPY INTO { internalStage | externalStage | externalLocation }

     FROM { [<namespace>.]<table_name> | ( <query> ) }

[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |

                    TYPE = { CSV | JSON | PARQUET }
[ formatTypeOptions ] } ) ]
[ copyOptions ]
[ HEADER ]

Data exfiltration risk

Because data can be exported to an external stage or an external location, there is a data exfiltration risk. An employee may unknowingly (or knowingly) export data from Snowflake to an external storage location that is unapproved by an organization’s security or compliance teams. 

For example, an employee may accidentally export sensitive data to an external stage that does not have the appropriate network security, access control, or encryption security and is not approved by the organization’s security team.

Let’s assume such an external stage was created using the following command:

CREATE STAGE my_external_stage
URL = ‘s3://my_company_bucket/sandbox/my_folder/’
CREDENTIALS = (AWS_KEY_ID = ‘...’ AWS_SECRET_KEY = ‘...’)
ENCRYPTION = (TYPE = NONE);

Now assume that a COPY INTO <location> command like the one shown below can export data in CSV format to that location, thus compromising the security of the sensitive data:

COPY INTO @my_external_stage
FROM (SELECT * FROM sensitive_data_table)
FILE_FORMAT = (TYPE = CSV)
HEADER = true;

As another example, a rogue employee may export sensitive data out to a personal container (external location) on Azure blob storage using a command such as this:

COPY INTO ‘azure://my_personal_account.blob.core.windows.net/my_container’
CREDENTIALS = (AZURE_SAS_TOKEN = ‘...’)
FROM (SELECT * FROM sensitive_data_table)
FILE_FORMAT = (TYPE = json);

Using credential-less stages to prevent data exfiltration

Snowflake introduced credential-less stages in December 2019. Using credential-less stages, an account admin can create storage integrations and constrain external stages to be created only on a list of allowed cloud storage locations. This provides account administrators with a mechanism to approve locations that users can access via an external stage.

In the following example, we create a storage integration for Amazon S3 storage which permits access to s3://my_company_bucket/secure_folder/ and s3://my_company_bucket2/secure_folder2/, but not to any other locations. Once this storage integration is created, a stage can leverage this storage integration to access those allowed locations.

CREATE STORAGE INTEGRATION my_company_storage_buckets_int
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = true
STORAGE_AWS_ROLE_ARN = 'aws role arn'
STORAGE_ALLOWED_LOCATIONS = ('s3://my_company_bucket/secure_folder/',
's3://my_company_bucket2/secure_folder2/');

The next step is to set up the storage integration. After it has been configured, we can use the storage integration we created above to create an external stage that can copy data in and out of the allowed locations in the my_company_storage_buckets_int storage integration:

CREATE STAGE my_external_stage
URL=’s3://my_company_bucket/secure_folder/february_records/’
STORAGE_INTEGRATION=my_company_storage_buckets_int
encryption=(master_key='eSxX0jzYfIamtnBKOEOwq80Au6NbSgPH5r4BDDwOaO8=');

When a COPY INTO <location> command uses my_external_stage, Snowflake will confirm that the stage’s URL is allowed by the STORAGE_ALLOWED_LOCATIONS list specified in my_company_storage_buckets_int. If the location is allowed, the AWS role ARN is used to access the data.

Similar process can be followed to create storage integration and stages for Azure storage or storage integration and stage for Google cloud storage.

Snowflake encourages all data to be encrypted using a master key or server-side encryption scheme supported by the storage provider. In the above command, client-side encryption is applied using the specified master key. You can learn more about encryption options here.

Controlling the creation and use of external stages with direct credentials

With the addition of credential-less external stages, many account admins would like to enforce that customers use the new pattern. Snowflake has added new account level parameters for account admins to restrict how data can be exported.

These parameters can be enabled using the following:

ALTER ACCOUNT my_account SET
REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION = true;

ALTER ACCOUNT my_account SET
REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION = true;

ALTER ACCOUNT my_account SET
PREVENT_UNLOAD_TO_INLINE_URL = true;

The REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION parameter lets account admins control whether new stages must use storage integrations. The REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION parameter lets account admins control whether existing stages without storage integrations may be used. 

If a user tries to create an external stage without a storage integration, the user will get an error message like this:

CREATE STAGE my_external_stage
URL = ‘s3://my_company_bucket/sandbox/my_folder/’
CREDENTIALS = (AWS_KEY_ID = ‘...’ AWS_SECRET_KEY = ‘...’)
ENCRYPTION = (TYPE = NONE);

--------------------------------------------------------------------------------------------------------------------

SQL compilation error: Creation of stages with direct credentials, including accessing public stages, has been forbidden. See your account administrator for details.

Similarly, if a user tries to export data to a nonapproved external location, the user will get an error message like this:

COPY INTO ‘s3://my_personal_bucket/sensitive_data/’
CREDENTIALS = (AWS_KEY_ID = ‘...’ AWS_SECRET_KEY = ‘...’)
FROM (SELECT * FROM sensitive_data_table)
FILE_FORMAT = (TYPE = parquet)
HEADER = true;

--------------------------------------------------------------------------------------------------------------------

SQL compilation error: Cannot unload to an inlined external location. Please create a stage first and unload to the stage instead.

Using these parameters, customers can configure their Snowflake account to prevent exfiltration of data. These parameters are generally available. Try them out and secure your business-critical data in Snowflake.

Learn more about by reading our release notes here, or by accessing our documentation:

Share Article

How to Configure AWS Glue with Snowflake for Data Integration

AWS Glue provides a fully managed environment that integrates easily with Snowflakes to manage data ingestion and transformation pipelines with ease.

Configure and Manage Data Pipelines Replication with Ease

Learn how data pipelines replication allows you to replicate and failover your entire data ingestion and transformation pipelines in Snowflake.

How to Get True ROI from your Account-Based Marketing | Blog

Learn how account-based marketing isn’t about one siloed team; It requires aligning all customer acquisition resources in Marketing & Sales.

Snowflake Service Account Security, Part 2

Snowflake Service Account Security, Part 2

Snowflake Service Account Security, Part 3

Snowflake Service Account Security, Part 1

Part 1 will focus on understanding why service accounts are excellent targets in the mind of the bad guys, and threats and attacks a bad guy may use.

Unlocking the Power of Geospatial Data for Insights

Whether working with data from Snowflake Marketplace, S3 datasets or your own account, learn how to uncover use-case specific geospatial data analytics.

How Marriott Modernized Their Data Architecture with Snowflake

Learn how Marriott revamped their data architecture with Snowflake, simplified complexity, and fueled business growth.

Data Ingestion: How to Load Terabytes into Snowflake | Snowflake Blog

Get an answer to the question: What’s the fastest way to load terabytes of data? for initial data loads into Snowflake or large-scale daily data ingestion.

Subscribe to our blog newsletter

Get the best, coolest and latest delivered to your inbox each week

Where Data Does More

  • 30-day free trial
  • No credit card required
  • Cancel anytime