Skip to content
  • Intorno all’azienda
  • Soluzioni Settoriali
  • Valore per Partner E Clienti
  • Prodotto e Tecnologia
  • thought leadership
Languages
  • 한국어
  • Français
  • 日本語
  • English
  • Italiano
  • Português
  • Español
  • Deutsch
  • 한국어
  • Français
  • 日本語
  • English
  • Italiano
  • Português
  • Español
  • Deutsch
  • Intorno all’azienda
  • Soluzioni Settoriali
  • Valore per Partner E Clienti
  • Prodotto e Tecnologia
  • thought leadership
  • 한국어
  • Français
  • 日本語
  • English
  • Italiano
  • Português
  • Español
  • Deutsch
  • Panoramica
    • Perché Snowflake
    • Storie di clienti
    • Partners
    • Servizi
  • Panoramica
    • La piattaforma
    • Il Marketplace Snowflake
    • Snowpark
    • Powered by Snowflake
    • Demo live
  • Workloads
    • Collaboration
    • Data Science & ML
    • Cybersecurity
    • Applications
    • Data Warehouse
    • Data Lake
    • Data Engineering
    • Unistore
  • Prezzi
    • Tutte le opzioni
  • Settori
    • Pubblicità, Media e Entertainment
    • Servizi finanziari
    • Healthcare e Life Sciences
    • Settore manifatturiero
    • Settore pubblico
    • Retail / CPG
    • Technology
  • For Departments
    • Marketing
    • IT
  • Imparare
    • Libreria risorse
    • Sviluppatori
    • Avvio rapido
    • Documentazione
    • Lab virtuali
    • Formazione
    • Le guide
    • Glossario
  • Collegamento
    • Blog
    • Comunità
    • Eventi
    • Webinars
    • Podcast
    • Supporto
    • Tendenza
  • Panoramica
    • Informazioni su Snowflake
    • Investor Relations
    • Leadership e CdA
    • Opportunità di lavoro
    • Sala stampa
    • ESG
    • Snowflake Ventures
Author
Sri Chintala
Sri Chintala
Share
Subscribe
Giu 05, 2024

Introducing Snowpark pandas API: Run Distributed pandas at Scale in Snowflake

  • Prodotto e tecnologia
Introducing Snowpark pandas API: Run Distributed pandas at Scale in Snowflake

Python’s popularity has grown significantly, quickly becoming the preferred language for development across machine learning, application development, pipelines and more. At Snowflake we are deeply committed to delivering a best-in-class platform for Python developers. In line with this commitment, we’re thrilled to announce the public preview support of Snowpark pandas API, enabling seamless execution of distributed pandas at scale in Snowflake.

Snowflake customers are already harnessing the power of Python through Snowpark, a set of libraries and code execution environments that run Python and other programming languages next to your data in Snowflake. With Snowpark’s existing DataFrame API, users have access to a robust framework for lazily evaluated, relational operations on data, closely resembling Spark’s conventions. In April 2024, Snowflake customers ran approximately 55 million queries in Snowpark on average each day for a spectrum of large-scale data processing tasks in data engineering and data science. Now, with the expansion of Snowpark to provide a pandas-compatible API layer, with minimal code changes, users will be able to get the same pandas-native experience they know and love with Snowflake’s performance, scale and governance. 

Figure 1. Whether you prefer the Spark-like workflow of the Snowpark DataFrame API or the familiarity of pandas, Snowpark empowers you to seamlessly streamline your data processing tasks within Snowflake.

Why introduce a distributed pandas API?

pandas is the go-to data processing library for millions worldwide, including countless Snowflake users. However, pandas was never built to handle data at the scale organizations are operating today. Running pandas code requires transferring and loading all of the data into a single in-memory process. It becomes unwieldy on moderate-to-large data sets and breaks down completely on data sets that grow beyond what a single node can handle. We know organizations work with this volume of data today, and Snowpark pandas enables you to execute that same pandas code, but with all the pandas processing pushed down to run in a distributed fashion in Snowflake. Your data never leaves Snowflake, and your pandas workflows can process much more efficiently using the Snowflake elastic engine. This brings the power of Snowflake to pandas developers everywhere.

Benefits of Snowpark pandas API

  • Accelerated and seamless development: Snowpark pandas overcomes the single-node memory limitation of traditional pandas, enabling developers to move effortlessly from prototype to production without encountering out-of-memory errors or having to rewrite pandas code to other frameworks (e.g. Spark, Snowpark DataFrames API or SQL), providing smooth and accelerated development cycles.
  • Meeting Python developers where they are: Snowpark pandas API preserves the same pandas API signatures and dataframe semantics that make pandas so easy to use and popular. No new syntax to learn or heavy amounts of code to rewrite.
Figure 2. Sample Snowpark pandas code 
  • Security and governance: Data does not leave Snowflake’s secure platform. The Snowpark pandas API pushes down the compute to where the data lives and brings uniformity within data organizations to how data is accessed, allowing for easier auditing and governance.
  • No additional compute infrastructure to manage and tune: The solution leverages the Snowflake compute engine and leverages pre-existing query optimization techniques within Snowflake. End users need not spin up, manage or tune any additional compute infrastructure.

Try it for yourself! Get started in less than 2 minutes by following this quickstart.

How does Snowpark pandas API work?

Snowpark pandas leverages the open source Modin API as the frontend client layer to maintain the exact pandas API signatures and preserve the dataframe semantics that have made pandas popular and easy to use. However, behind the scenes, Snowpark pandas operates differently. Instead of interacting with an in-memory pandas dataframe, under the hood, DataFrame operations are transparently converted into SQL queries that get pushed down and benefit from Snowflake’s robust and powerful compute engine. This means you can continue using pandas syntax while benefiting from Snowflake’s battle-tested, scalable and heavily optimized data infrastructure to execute your pandas code in a distributed fashion.

Furthermore, you have the flexibility to incorporate custom Python logic as User Defined Functions (UDFs) and leverage popular open source packages already preinstalled in Snowflake. This allows you to utilize pandas’ versatile apply() function to process data along DataFrame or Series axes with ease, whether it be applying built-in Python functions, lambda functions or custom user-defined functions.

Figure 3.
Native pandas operations are transpiled and pushed down to run as SQL queries.
Custom Python code is serialized and pushed down to run in a secure, sandboxed Python environment.

As of this blog’s writing, the Snowpark pandas API covers most popular pandas API functionality, with ongoing efforts to expand support. Furthermore, we will be looking into integrating with downstream third-party OSS libraries and more. Give it a try and let us know your feedback by emailing us at [email protected].

Resources to get started:

Quickstart

Documentation

Github Examples repo

Share

Related Content

  • Prodotto e tecnologia
Apr 17, 2024

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

In today's data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. To deliver on these goals, developers…

Discover
Read More
  • Prodotto e tecnologia
Mag 06, 2024

Reimagine Batch and Streaming Data Pipelines with Dynamic Tables, Now Generally Available

Since Snowflake’s Dynamic Tables went into preview, we have worked with hundreds of customers to…

More
Read More

The Data Engineer’s Guide to Python for Snowflake

Download now

Snowflake Inc.
  • La piattaforma
    • Il Data Cloud
    • L’architettura
    • Prezzi
    • Il Marketplace Snowflake
    • Sicurezza e fiducia
  • SOLUZIONI
    • Servizi finanziari
    • Pubblicità, Media e Entertainment
    • Retail / CPG
    • Marketing Analitico
  • RISORSE
    • Libreria risorse
    • Webinar
    • Documentazione
    • Community
    • Procuratore
    • Legale
  • Esplorare
    • Notizie
    • Blog
    • Tendenza
    • Guide
    • Sviluppatori
  • CIRCA
    • Informazioni su Snowflake
    • Investor Relations
    • Leadership e CdA
    • Snowflake Ventures
    • Opportunità di lavoro
    • Contatto

Thanks for signing up!

  • Privacy Policy
  • Site Terms
  • Cookie Settings
  • Do Not Share My Personal Information

© 2024 Snowflake Inc. All Rights Reserved |  If you’d rather not receive future emails from Snowflake, unsubscribe here or customize your communication preferences