SNOWFLAKE CERTIFIED SOLUTION
Understanding Customer Reviews using Snowflake Cortex
classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) as classification
FROM clothing_issue_reviews
)
classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) as classification
FROM clothing_issue_reviews
)
classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) as classification
FROM clothing_issue_reviews
)
classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) as classification
FROM clothing_issue_reviews
)
classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) as classification
FROM clothing_issue_reviews
)
Overview
Understanding customer feedback is critical for businesses, but analyzing large volumes of unstructured text can be challenging. In this solution, you will use Cortex AISQL to systematically gain insights from unstructured customer feedback.
This solution is one of the templates inside Snowflake, that allows you to leverage multiple AISQL functions to answer different use case questions upon customer reviews. You can access the template from within Snowflake by clicking the “Open in Snowflake” button above, or you can follow the instructions below to execute this code on your own.
Context
Tasty Bytes is a global e-commerce company selling different merchandise. They collect customer reviews to gain insights into how their products are performing.
The following code can be copied into a notebook to execute. It leverages multiple AISQL functions to answer different use case questions upon customer reviews.
Step 1: Set up your environment and data
Let's begin by running the query below. It sets the proper context for this session. It also creates and populates two tables, PRODUCT_REVIEWS and PRODUCT_CATALOG, with sample data for our analysis.
Create a Project > Notebook and add the following SQL code and add the cell name: IMPORT_DATA_SQL
USE ROLE SNOWFLAKE_LEARNING_ROLE;
-- use the existing database and schema
USE DATABASE SNOWFLAKE_LEARNING_DB;
SET schema_name = CONCAT(current_user(), '_CUSTOMER_REVIEW_ANALYSIS_WITH_AISQL');
CREATE SCHEMA IF NOT EXISTS IDENTIFIER($schema_name);
USE SCHEMA IDENTIFIER($schema_name);
/*-- • file format and stage creation --*/
CREATE OR REPLACE FILE FORMAT csv_ff
TYPE = 'csv'
SKIP_HEADER = 1;
CREATE OR REPLACE STAGE s3load
COMMENT = 'Quickstarts S3 Stage Connection'
URL = 's3://sfquickstarts/misc/aisql/ecommerce_customer_review/'
FILE_FORMAT = csv_ff;
/*-- • raw zone table build --*/
CREATE OR REPLACE TABLE customer_data
(
CUSTOMER_ID VARCHAR(16777216),
CUSTOMER_SEGMENT VARCHAR(16777216),
JOIN_DATE DATE,
LIFETIME_VALUE NUMBER(38,2),
PREVIOUS_PURCHASES NUMBER(38,0),
AGE_RANGE VARCHAR(16777216),
GENDER VARCHAR(16777216),
PREFERRED_CATEGORY VARCHAR(16777216)
);
CREATE OR REPLACE TABLE product_catalog
(
PRODUCT_ID VARCHAR(16777216),
PRODUCT_NAME VARCHAR(16777216),
CATEGORY VARCHAR(16777216),
SUBCATEGORY VARCHAR(16777216),
MANUFACTURER VARCHAR(16777216),
PRICE NUMBER(38,2),
RELEASE_DATE DATE,
REVIEW_COUNT NUMBER(38,0)
);
CREATE OR REPLACE TABLE product_reviews
(
REVIEW_ID VARCHAR(16777216),
PRODUCT_ID VARCHAR(16777216),
CUSTOMER_ID VARCHAR(16777216),
REVIEW_TEXT VARCHAR(16777216),
RATING NUMBER(38,0),
REVIEW_DATE DATE,
PURCHASE_DATE DATE,
VERIFIED_PURCHASE BOOLEAN,
HELPFUL_VOTES NUMBER(38,0)
);
/*-- • raw zone table load --*/
COPY INTO customer_data
FROM @s3load/customer_data.csv
ON_ERROR = CONTINUE;
COPY INTO product_catalog
FROM @s3load/product_catalog.csv
ON_ERROR = CONTINUE;
COPY INTO product_reviews
FROM @s3load/product_reviews.csv
ON_ERROR = CONTINUE;
-- setup completion note
SELECT 'Setup is complete' AS note;Now add this code to another SQL cell and name it: CHECK_DATA_SQL
-- Quick preview of the table
SELECT *
FROM product_reviews
LIMIT 25;Step 2: Correlate sentiment with ratings
As a first step, let's perform a quick sanity check. We'll use the SNOWFLAKE.CORTEX.SENTIMENT function to score the sentiment of each review. We can then check if it correlates with the user-provided star rating to see if they align.
Add the following code to another SQL cell and name it: SENTIMENT_CHECK_SQL
WITH EXTRACTED_SENTIMENT AS (
SELECT
RATING,
SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) AS SENTIMENT
FROM PRODUCT_REVIEWS
)
SELECT CORR(SENTIMENT, RATING) AS SENTIMENT_RATING_CORRELATION
FROM EXTRACTED_SENTIMENT;Step 3: Find top issues in a category
Now, let's dig deeper. Suppose you want to know what the biggest complaints are for 'Electronics'. You can focus on the ones with negative sentiments, and use AI_AGG to analyze all relevant reviews and aggregate the common themes into a single summary.
Add the following code to another SQL cell and name it: AGG_TOP_ISSUES_SQL
SELECT
AI_AGG(
REVIEW_TEXT,
'What are the top 3 most common product issues reported in these reviews?'
) AS TOP_ISSUES
FROM PRODUCT_REVIEWS pr
JOIN PRODUCT_CATALOG pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Electronics'
AND SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) < 0;-- The text may not display fully in the SQL cell. Please hover or double-click on the SQL cell to view the full text. Print the result to a dataframe for easier reading using the following code in a Python cell, name it: DISPLAY_TOP_ISSUES_PY
# to view the result
df = AGG_TOP_ISSUES_SQL.to_pandas()
print(df['TOP_ISSUES'].iloc)Step 4: Identify the most common issues
To answer this question, we start with filtering to Clothing category. Another way to identify comments that mentioned product issue is to leverage our latest AI_FILTER to conduct filtering using natural language.
The next step we use the AI_AGG function to get a list of all product issues mentioned.
Create a SQL cell to add the following code and name it: COMMON_ISSUE_SQL
-- Create temporary table on the filtered result to be re-used in next step analytics.
CREATE OR REPLACE TEMP TABLE filtered_product_reviews AS
SELECT *
FROM product_reviews
WHERE AI_FILTER(PROMPT('This review mentions a product issue or complaint: {0}', review_text));
-- Leverage AI_AGG functions to find the common issues mentioned.
-- The text may not display fully in the SQL cell. Please hover around or double check on the SQL cell to view the full text.
SELECT
AI_AGG(
review_text,
'Analyze these clothing product reviews and provide a comprehensive list of all product issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage.'
) as clothing_issues
FROM filtered_product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Clothing';Now, we'll print the result to a dataframe for easier reading. Use the following Python code and name the cell:
# to view the result
import pandas as pd
df = COMMON_ISSUE_SQL.to_pandas()
print(df['CLOTHING_ISSUES'].iloc[0])Step 5: Productionalize the pipeline
With the issues suggested through the AI_AGG function pipeline above, we can now leverage AI_CLASSIFY to turn into continuous data pipeline to keep classify the reviews.
Paste the following code in a SQL cell and name it CLASSIFY_SQL
WITH clothing_issue_reviews AS (
SELECT
pr.review_id,
pr.review_text
FROM filtered_product_reviews pr
JOIN product_catalog pc
ON pr.product_id = pc.product_id
WHERE
pc.category = 'Clothing'
), classified_reviews AS (
SELECT
review_id,
review_text,
AI_CLASSIFY(
review_text,
[
'Sizing issue',
'Color issue',
'Fabric quality issue',
'Washing problem',
'Pricing issue'
]
) AS classification
FROM clothing_issue_reviews
)
SELECT
review_id,
review_text,
classification:labels::text AS issue_category
FROM classified_reviews;Step 6: Generate responses to customer complaints
Finally, let's close the loop. You can use AI_COMPLETE to help your support team draft empathetic and relevant responses to negative reviews, improving customer satisfaction at scale.
Create another SQL Cell and name it: GENERATE_SQL. Add the following code to it:
WITH clothing_issue_reviews AS (
SELECT
pr.review_id,
pr.review_text
FROM filtered_product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Clothing'
)
SELECT
review_id,
review_text,
AI_COMPLETE('llama4-maverick', 'Please draft a concise response to the customer complaints below. Please only include the draft and nothing else: ' || review_text) as response
FROM clothing_issue_reviewsKey Takeaways
End-to-End Workflow: You can chain Cortex AI functions together (SENTIMENT -> AI_AGG -> AI_CLASSIFY -> AI_COMPLETE) to build a powerful analysis pipeline entirely within Snowflake.
Insight from Unstructured Data: You don't need complex data science tools to extract valuable insights from text. All of this was done with familiar SQL.
Automate and Scale: By identifying common issues and creating classifiers, you can automate the process of tracking feedback and responding to customers more efficiently.
This solution was created by an in-house Snowflake expert and has been verified to work with current Snowflake instances as of the date of publication.
Solution not working as expected? Contact our team for assistance.