Starburst Blog

Subscribe to Our Blog

Subscribe to Email Updates

Featured Post

Recent Posts

Top 10 reasons to migrate from EMR to SEP

In today’s data architecture economy, there are no shortages of options when it comes to choosing various distributions and deployment strategies for a given technology. You can deploy many open-source technologies on-prem yourself or you can run the same cluster on a cloud platform. If you decide to have a managed enterprise solution then you can choose between a cloud vendor or a standalone vendor. Each of these choices come with tradeoffs, but we believe that there are some tradeoffs with higher payoffs.

Presto is no exception to this notion. There are plenty of options when shopping for which distribution or deployment strategy of Presto you should choose. If you are on one of the teams that chose to deploy your own open-source Presto cluster or utilize Presto on EMR then this post aims to highlight the advantages of moving your Presto cluster to running on an independent Starburst Enterprise Presto cluster. If you’re someone who is investigating Presto and you are balancing between open-source, Presto EMR, or an enterprise Presto distribution like Starburst, this post will offer some insights to help lead you in the right direction for your architecture.

One of the great benefits when using Presto is that it is not actually storing any of the persistent data. This means if you are living on Presto EMR, this actually makes migration so much easier than if you were trying to migrate between two different databases as you would have to copy your data as a time-consuming and costly part of the process.

With that, here are the top 10 reasons you should make the move from EMR Presto to Starburst Enterprise Presto (SEP).

  1. Cloud Platform Agnosticism - SEP is available on all the major cloud platforms, including Amazon AWS, Microsoft Azure, Google Cloud Platform, and RedHat Marketplace. While there can be some benefit to building some cloud-native applications, there is a clear advantage to having the flexibility to architect your system in such a way that it can run across any cloud provider. This expands your options if a cloud provider doesn’t exist in the area you want to deploy your system, your clients don’t want their data stored with a particular cloud provider, or you simply want to shop for the lowest bidder when it comes to paying those monthly cloud bills. Whatever your reasoning, having the ability to adjust to new and unknown use cases is always beneficial for the health of your applications.
  2. Improved Security - SEP offers a wide range of security features such as role based access control, data masking, and encryption out of the box. There are also many more security features and configurations added to the various data source connectors that augment and simplify securing your cluster.
  3. Improved Stability - SEP keeps up to date with the latest innovations from open-source Presto by doing short-term support STS releases, while less frequently releasing a much more stable LTS release. LTS releases go through an extra rigorous validation cycle before releases are made available to the general public so you can rest assured that you are using the most stable Presto available. If we find any vulnerability in the system in any of our releases, you will have a team of engineers able to quickly fix the issue. A team that includes the co-creators of Presto.
  4. Better Performance - While SEP intentionally does not differ from the open-source core engine, the connectors lie at the heart of how SEP attains the speedups over any other version of Presto on the market. The cost-based optimizer is only as efficient as the metadata we feed it to devise the most optimal query plan. With SEP we provide connectors that expose statistics from the data source that improves your query plans and ultimately performance. In addition to exposing table statistics, most connectors are upgraded to run in parallel to enable faster reads and better utilize the parallel operators in core Presto. See more about which connectors have these performance enhancements here.
  5. Enterprise Connectors - While we’re still talking about connectors, SEP also comes with a plethora of enticing connectors that do not exist with any other vendor. This includes connectors such as the Snowflake connector and Delta Lake connector. For many that are anxious about vendor lock-in with any of these platforms, we have you covered. You can run Starburst over a database like Snowflake and this allows you to query and move data seamlessly throughout your system. For a full list of enterprise connectors, check out our documentation here.
  6. High Availability - SEP comes with High Availability options baked in. While you will typically have multiple workers running in a cluster, only a single coordinator is allowed to exist in a Presto cluster. This makes the coordinator a single point of failure and if it goes down, you lose all availability of the system. SEP has an option to provide the number of coordinator backups to have available in the case of failure. These backups are configured but shutdown to minimize costs and only come up in the event of a coordinator failure to minimize outage time.
  7. Support - You’ve likely dealt with support teams that help with multiple cloud products, EMR Presto being one of them. If you’re experiencing an issue, do you want the jack-of-all trades, or a team of Presto experts? Starburst offers 24x7 support from a team that consists of the overwhelming majority of Presto contributors. This includes the co-creators of Presto and co-founders of Starburst and early adopters of Presto.
  8. Presto Roadmap Influence - By having a team that contributes a lot of their efforts to the open-source project, we are able to take the experiences from our users and translate their needs into a roadmap for Presto at large. This is a mutually beneficial relationship as the open-source project grows with the larger communities’ needs.
  9. Ease of Deployment - SEP offers out-of-the-box Kubernetes deployment, CloudFormation Templates, and AMI images all as options to deploy your cluster in minutes. It’s something you will want to experience yourself after tediously pouring over difficult and outdated documentation.
  10. Lower Compute and Deployment Costs - With SEP, you’ll save time and cut compute costs with autoscaling. If your organization occasionally needs high concurrency or large data processing, SEP automatically scales up and scales down once you’re done using those resources. You’ll also be saving your engineers hours of building security and management solutions that have already been solved with our distribution.

That concludes the 10 reasons you should migrate from EMR Presto to Starburst Enterprise Presto. There are many other aspects to be discussed when considering just how many wins there are to moving off of EMR specifically that our team would be happy to discuss with you whenever you’re ready to make your move. If you want to learn more about migrating your existing cluster or starting a new Presto cluster with us, check out our video on How to Migrate from EMR to Starburst Enterprise Presto or feel free to reach out to us. I hope this was enlightening and wish you well on your journey to free your data.

Brian Olsen

Brian is a U.S. Marine turned software engineer and developer advocate working to foster the open-source Presto community. Brian spent four years as a data engineer at a cybersecurity company working on pipeline maintenance and query optimization. While in this role, Brian was responsible for maintaining data pipelines and migrations to include replacing some legacy data warehousing systems to use open-source Presto. Brian is a published author in ACM and IEEE geospatial database conferences.

Presto Book Download CTA

Your Comments :

blog-cta

From Facebook

Read more of what you like.