Starburst Blog

Subscribe to Our Blog

Subscribe to Email Updates

Featured Post

Recent Posts

Starburst @ Strata --- 3x Presto!

The Strata Data conference in San Jose last week was another great event in the Strata series. Those coming from the Hadoop background definitely noticed how little Hadoop there was. Quite a change from 2009 when I first attended the Hadoop World conference in NYC. This year’s favorite topic was AI and machine learning while the expo hall featured quite a variety of vendor companies covering stream processing, data storage, modern DBMS, as well as various Business Intelligence tools and their accelerators.

While advanced analytics is surely a hot topic, many hallway conversations revealed that good old SQL analytics and managing ever-growing datasets are what occupies most of the attendees’ time at work. Are there any significant trends there? 

Presto is definitely one picking up momentum! Netflix, Uber and Lyft shared their data platform evolution stories and the common themes are:

  • moving away from expensive proprietary data mart and data warehouse solutions to S3 or HDFS to scale economically to 10s and 100s of PBs of data
  • running 100s of nodes of Presto for interactive SQL analytics with tens and hundreds of thousands of queries per day
  • leveraging Hive and/or Spark for long-running batch data transformation ETL jobs

I encourage everyone to take a look at the slides and watch recordings when they appear on the conference website. I am going to include the links to the talks at the bottom. In the meantime, I want to note a few highlights:

Netflix is obviously a well-known Presto user, working with it since 2014. They have talked about their Presto experiences several times in the past few years (see presentations from 2015, 2016, and 2017). The latest status is that Presto at Netflix is a primary interactive SQL engine for their S3-based data warehouse of 100PBs while Spark is used for long-running data transformation jobs and Amazon Redshift covers some remaining niche use cases.

Uber started with Presto  about 2 years ago and has two Presto clusters (hundreds of nodes total) and their Presto usage is growing fast (50% since last year). Today Presto runs 180K queries daily, more than Hive and Spark combined. Similarly to Netflix, Presto is used for interactive SQL queries and scales to levels beyond what their “Commercial DB” can achieve.

Lyft joined the Presto user community last year. Their goal is to move away from Amazon Redshift and Presto is their choice for interactive SQL analytics while Hive is used for big batch ETL. Today Lyft stores 20PBs in S3 and it is growing at the rate of 3B+ events / day.

Here are the links to the talks at Strata:

Enjoy the links and let us know when we can help you to get the most out of Presto!

 

Kamil Bajda-Pawlikowski

Kamil is a co-founder and CTO of Starburst. Previously, Kamil was the chief architect at the Teradata Center for Hadoop in Boston, focusing on the open source SQL engine Presto, and the cofounder and chief software architect of Hadapt, the first SQL-on-Hadoop company (acquired by Teradata). Kamil began his journey with Hadoop and modern MPP SQL architectures about 10 years ago during a doctoral program at Yale University, where he co-invented HadoopDB, the original foundation of Hadapt’s technology. He holds an MS in computer science from Wroclaw University of Technology and both an MS and an MPhil in computer science from Yale University. Kamil is co-author of several US patents and a recipient of 2019 VLDB Test of Time Award.

Presto Book Download CTA

Your Comments :

blog-cta

From Facebook

Read more of what you like.

By | on 13, Mar 2018 |   Company Update

The Strata Data conference in San Jose last week was another great event in the Strata series. Those coming from the Hadoop background definitely noticed how little Hadoop there was. Quite a change fr[...]

By | on 13, Mar 2018 |   Company Update

The Strata Data conference in San Jose last week was another great event in the Strata series. Those coming from the Hadoop background definitely noticed how little Hadoop there was. Quite a change fr[...]

By | on 13, Mar 2018 |   Company Update

The Strata Data conference in San Jose last week was another great event in the Strata series. Those coming from the Hadoop background definitely noticed how little Hadoop there was. Quite a change fr[...]