SNOWFLAKE CERTIFIED SOLUTION

Build an End-to-End ML Workflow in Snowflake

xgb_base = XGBClassifier(
    max_depth=50,
    n_estimators=3,
    learning_rate = 0.75,
    booster = 'gbtree')

In [ ]:
X_train_pd = train_pd.drop(["TIMESTAMP", "LOAN_ID", "MORTGAGERESPONSE"],axis=1) #remove
y_train_pd = train_pd.MORTGAGERESPONSE

xgb_base.fit(X_train_pd,y_train_pd)


In [ ]:
from sklearn.metrics import f1_score, precision_score, recall_score
train_preds_base = xgb_base.predict(X_train_pd) #update this line with correct ata

f1_base_train = round(f1_score(y_train_pd, train_preds_base),4)
precision_base_train = round(precision_score(y_train_pd, train_preds_base),4)
recall_base_train = round(recall_score(y_train_pd, train_preds_base),4)

print(f'F1: {f1_base_train} \nPrecision {precision_base_train} \nRecall: {recall_base_train}')
watch the demo

Learn more about Snowflake ML

Overview

This solution allows you to build and deploy a complete machine learning workflow entirely within Snowflake ML. You'll work through a mortgage lending prediction use case, implementing each stage of the ML lifecycle from feature engineering to model deployment and monitoring.

This solution showcases how to build an end-to-end ML workflow, including:

  • Defining and managing features with Snowflake Feature Store

  • Model training and hyperparameter optimization with Snowflake ML APIs

  • Versioning and lifecycle management with Snowflake Model Registry

  • Tracking performance and drift with integrated ML Observability

The video shows a demo of how you can build, deploy, serve, and monitor models in production with a set of integrated MLOps features that seamlessly work together.

Solution Architecture: End-to-end ML workflow in Snowflake

expand

The quickstart walks through how to:

1. Use Snowflake Feature Store to track engineered features

  • Store feature definitions in feature store for reproducible computation of ML features

2. Train two Models using the Snowflake ML APIs

  • Baseline XGboost

  • XGboost with optimal hyper-parameters identified via Snowflake ML distributed HPO methods

3. Register both models in Snowflake Model Registry

  • Explore model registry capabilities such as metadata tracking, inference, and explainability

  • Compare model metrics on train/test set to identify any issues of model performance or overfitting

  • Tag the best performing model version as 'default' version

4. Set up Model Monitor to track 1 year of predicted and actual loan repayments

  • Compute performance metrics such a F1, Precision, Recall

  • Inspect model drift (i.e. how much has the average predicted repayment rate changed day-to-day)

  • Compare models side-by-side to understand which model should be used in production

  • Identify and understand data issues

5. Track data and model lineage throughout

  • View and understand:

    • The origin of the data used for computed features

    • The data used for model training

    • The available model versions being monitored

SNOWFLAKE CERTIFIED SOLUTION

This solution was created by an in-house Snowflake expert and has been verified to work with current Snowflake instances as of the date of publication.

Solution not working as expected? Contact our team for assistance.

SNOWFLAKE FEATURES USED
  • model registry
  • notebooks
  • snowflake ml
  • feature store
RELEVANT INDUSTRIES
  • FINANCIAL SERVICES
SHARE SOLUTION

what’s next?

Explore more developer content and build your skills.