Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes.

For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/ Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site.

Functional/ Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/ Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.

Blog

Resources

Documentation

Kamil Bajda-Pawlikowski

Co-Founder and CTO

Starburst

Starburst @ Strata — 3x Presto!

Last Updated: April 6, 2023

Case Studies Trino

The Strata Data conference in San Jose last week was another great event in the Strata series. Those coming from the Hadoop background definitely noticed how little Hadoop there was. Quite a change from 2009 when I first attended the Hadoop World conference in NYC. This year’s favorite topic was AI and machine learning while the expo hall featured quite a variety of vendor companies covering stream processing, data storage, modern DBMS, as well as various Business Intelligence tools and their accelerators.

While advanced analytics is surely a hot topic, many hallway conversations revealed that good old SQL analytics and managing ever-growing datasets are what occupies most of the attendees’ time at work. Are there any significant trends there?

Presto is definitely one picking up momentum! Netflix, Uber and Lyft shared their data platform evolution stories and the common themes are:

moving away from expensive proprietary data mart and data warehouse solutions to S3 or HDFS to scale economically to 10s and 100s of PBs of data
running 100s of nodes of Presto for interactive SQL analytics with tens and hundreds of thousands of queries per day
leveraging Hive and/or Spark for long-running batch data transformation ETL jobs

I encourage everyone to take a look at the slides and watch recordings when they appear on the conference website. I am going to include the links to the talks at the bottom. In the meantime, I want to note a few highlights:

Netflix is obviously a well-known Presto user, working with it since 2014. They have talked about their Presto experiences several times in the past few years (see presentations from 2015, 2016, and 2017). The latest status is that Presto at Netflix is a primary interactive SQL engine for their S3-based data warehouse of 100PBs while Spark is used for long-running data transformation jobs and Amazon Redshift covers some remaining niche use cases.

Uber started with Presto about 2 years ago and has two Presto clusters (hundreds of nodes total) and their Presto usage is growing fast (50% since last year). Today Presto runs 180K queries daily, more than Hive and Spark combined. Similarly to Netflix, Presto is used for interactive SQL queries and scales to levels beyond what their “Commercial DB” can achieve.

Lyft joined the Presto user community last year. Their goal is to move away from Amazon Redshift and Presto is their choice for interactive SQL analytics while Hive is used for big batch ETL. Today Lyft stores 20PBs in S3 and it is growing at the rate of 3B+ events / day.

Here are the links to the talks at Strata:

Enjoy the links and let us know when we can help you to get the most out of Presto!

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

© Starburst Data, Inc. Starburst and Starburst Data are registered trademarks of Starburst Data, Inc. All rights reserved. Presto®, the Presto logo, Delta Lake, and the Delta Lake logo are trademarks of LF Projects, LLC

Start Free with
Starburst Galaxy

Up to $500 in usage credits included

Query your data lake fast with Starburst's best-in-class MPP SQL query engine
Get up and running in less than 5 minutes
Easily deploy clusters in AWS, Azure and Google Cloud

For more deployment options:

Download Starburst Enterprise

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Blog

Resources

Pages

Documentation

Starburst @ Strata — 3x Presto!

Last Updated: April 6, 2023

Related posts

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free with
Starburst Galaxy

For more deployment options:

Essential/Strictly Necessary Cookies

Analytical/ Performance Cookies

Functional/ Preference Cookies

Targeting/ Advertising Cookies

By Use Cases

By Industry

Documentation

Connect

Education

Starburst Galaxy

Starburst Enterprise

By Use Cases

By Industry

Documentation

Connect

Education

Filter:

Blog

Resources

Pages

Documentation

Starburst @ Strata — 3x Presto!

Last Updated: April 6, 2023

Related posts

Starburst Enterprise LTS Backport Releases

Introducing New Data Observability Features in Starburst Galaxy – Now in Public Preview

Automating the “Icehouse” – Fully-managed Open Lakehouse Platform on Starburst Galaxy

What’s New in Starburst Galaxy – April 2024

A single point of access to all your data

Stay in the know - Sign up for our newsletter!

Resources

Quick Links

Get In Touch

Start Free withStarburst Galaxy

For more deployment options:

Start Free with
Starburst Galaxy