SNOWFLAKE FOR DEVELOPERS

OPEN SOURCE AT SNOWFLAKE

By building with open source, developers can innovate faster with powerful services. Our engineers regularly contribute to open source projects to accelerate the innovation that our customers and the industry benefit from.

Platform diagram

People

Open source only works when the technical community joins forces to commit together. We’re proud of the work that Snowflakes do every day to contribute to and lead open source projects.

Scale icon

Projects

In addition to open sourcing our own projects and libraries, Snowflake supports key open source projects through technical and financial contributions. 

app icon

Products

Open source is important to Snowflake because we use it ourselves. Many of our products and features are built directly on top of popular open source projects.

Left Quote Icon

Open source is the heartbeat of innovation. By contributing and collaborating with the community, Snowflake not only accelerates our own progress but also empowers everyone to build faster, smarter, and more openly on the Data Cloud. It's a shared journey toward limitless data potential."

Sridhar Ramaswamy

Benoit Dageville
Co-founder and President of Product

meet ourcontributors

Open source only works when the technical community joins forces to commit together. We’re proud of the work that Snowflakes do every day to contribute to and lead open source projects.

48 Results

Adnan Hemani

Apache Polaris (incubating) Committer

Anna Filippova

Apache Polaris (incubating) Committer

Anupam Datta

Trulens Technical Advisory Committee

Aurick Qiao

vLLM Maintainer

Aykut Bozkurt

pg_parquet Maintainer

Bob Paulin

Apache Software Foundation VP Fundraising Apache Tika PMC

Bryan Bende

Apache NiFi PMC Apache Software Foundation Member

Danica Fine

Jupyter Foundation Governing Board

Daniel Huang

Trulens Maintainer

Previous

1

2

3

4

5

6

Next

Explore Open Source Contributions

In addition to open sourcing our own projects and libraries, Snowflake supports key open source projects through technical and financial contributions.

XXX

Contributers

XXX

Projects Contributed To

XXX

Projects Open Sourced

data chat logo

Anaconda®

Anaconda is a distribution of Python and R focused on scientific computing. It simplifies package management and deployment for data science, machine learning, large-scale processing, and predictive analytics.

apache beam logo

Apache Beam®

Apache Beam is a unified programming model for batch and streaming data processing. It provides language SDKs for building pipelines and runners that execute them on distributed engines.

apache iceberg logo

Apache Iceberg™

Apache Iceberg is an open table format for managing large tabular datasets. It improves on traditional Hive- or Spark-based tables with better performance, reliability, and interoperability across engines.

data chat logo

Apache NiFi™

NiFi automates cybersecurity, observability, event streams, and generative AI data pipelines and distribution for thousands of companies worldwide across every industry.

apache polaris logo

Apache Polaris (incubating)

Apache Polaris is an open-source, fully featured catalog for Apache Iceberg™. It standardizes how Iceberg tables are managed and accessed across engines and platforms.

data chat logo

Apache Spark™

Apache Spark is a unified analytics engine for large-scale data processing. It offers APIs in multiple languages and provides distributed processing with data parallelism and fault tolerance.

snowflake logo

Arctic Embed

arctic-embed is a suite of high-quality text embedding models. It is optimized for building retrieval systems with strong performance and efficiency.

data chat logo

ArcticInference

ArcticInference is an open-source plugin for vLLM. It delivers fast, cost-effective inference for large language models and embeddings.

data chat logo

ArcticTraining

ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs).

dbt logo

dbt™

dbt is a command-line tool that enables analytics engineers to transform data in their warehouses by writing select statements. Dbt turns those select statements into tables and views and transforms data without extracting or loading it.

deepspeed logo

DeepSpeed™

DeepSpeed is a deep learning optimization library built on PyTorch. It enables efficient training and serving of very large AI models through advanced memory and parallelism techniques.

data chat logo

Django® for Snowflake

Django is a high-level Python web framework that promotes rapid development and clean design. Snowflake maintains the backend integration for Django.

feast logo

Feast

Feast is an open-source feature store for machine learning. It streamlines feature sharing, reuse, and serving for both training pipelines and real-time inference.

foundation db logo

FoundationDB™

FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations.

goose logo

Goose

Goose is an open-source, local-first AI agent framework that integrates with your tools via the Model Context Protocol (MCP) to automate and extend developer workflows.

jupyter logo

Jupyter® Notebooks

Jupyter Notebook is an interactive, open-source environment for data and scientific computing. It combines code, text, and visualizations in shareable, executable documents.

data chat logo

Lezer-snowsql

Lezer-snowsql is a SnowSQL grammar for the lezer parser system. Lezer provides a parser generator that outputs JavaScript modules.

data chat logo

Modin

Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your cores. Modin works especially well on larger datasets, where pandas becomes slow or runs out of memory.

snowflake logo

Open Semantic Interchange

The OSI Initiative is a collaborative open-source project for standardizing semantic model exchange. It aims to streamline how analytics, AI, and BI tools share and use semantic models.

data chat logo

PostGIS

PostGIS is the geospatial extension for PostgreSQL. It adds support for geographic objects, spatial queries, and geospatial analytics.

postgresql

Postgres®

Postgres (PostgreSQL®) is a popular open-source relational database. It is known for its reliability, extensibility, and strong SQL compliance.

pytorch logo

PyTorch®

PyTorch is a leading open source machine learning library for building and training deep learning models. Snowflake joined the PyTorch Foundation as a general member to help accelerate the adoption of PyTorch.

data chat logo

SansShell

SansShell is primarily a gRPC server with a variety of options for localhost debugging and management. Its goal is to replace the need to use an interactive shell for emergency debugging and recovery with a much safer interface.

data chat logo

schemachange

schemachange is a Python-based tool to manage Snowflake objects. It follows an imperative-style approach to database change management (DCM).

sequelize logo

Sequelize

Sequelize is an easy-to-use and promise-based Node.js ORM tool for Postgres, MySQL, MariaDB, SQLite, DB2, Microsoft SQL Server, Snowflake, and IBM. It features solid transaction support, relations, eager and lazy loading, read replication and more.

data chat logo

Snowpark for Python Client API

Snowpark for Python, client side library provides dataframe style APIs for querying and processing data in Snowflake. It lets you build and deploy data pipelines, ML workflows and applications from any IDE that can run a Python kernel

data chat logo

Streamlit

Streamlit is a Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.

data chat logo

Terraform Provider: Snowflake

Terraform is an infrastructure-as-code tool that lets you build, change, and version resources. Our partners at the Chan Zuckerberg Initiative developed a Terraform provider for Snowflake that we now maintain.

data chat logo

TruLens™

TruLens is an open-source toolkit for evaluating, testing, and monitoring LLM applications, with built-in support for feedback, observability, and guardrails.

numfocus logo

vLLM

vLLM is a high-performance inference engine for large language models that optimizes memory use with PagedAttention to deliver faster, more efficient serving at scale.

see where snowflakeis built on open

We believe in building on the shoulders of giants. Many core Snowflake features are built directly upon robust, popular open source projects and open standards. Understanding these foundations gives you insight into our architecture and helps you integrate your own tools seamlessly.

get started

take the next step with open source

Platform diagram

Snowflake Labs

Snowflake Labs hosts projects that were developed by our community, customers and people at Snowflake. We invite everyone to contribute code, report bugs and help improve the documentation.

AI icon

How to get involved

Do you have an open source project we should support? Do you want to contribute to projects we maintain? Get in touch.

Scale icon

OSS blog posts

Stay up-to-date on how the Snowflake engineering team is using open source across our platform with our engineering blog posts.

what’s next?

Explore more developer content and build your skills.