We’re pleased to announce the general availability of Starburst Enterprise Presto 312e. Each new release becomes more exciting than the previous. This blog introduces the main features we’ve added, and they’re important to you as a Presto user.
312e New Features:
- Google Cloud Platform: Including Cloud Storage, Kubernetes Service, and Dataproc
- Kubernetes: Support for Presto on Kubernetes environments
- Data Source Connectivity: Including parallel Teradata connector, MapR connector, and support for Azure Data Lake Storage Gen2
- Presto Core: Including performance and security features
Google Cloud Platform
With our 312e release, Starburst Enterprise Presto now supports all three major cloud platforms:
- Amazon Web Services (AWS)
- Microsoft Azure
- Google Cloud Platform (GCP)
We consider the addition of GCP a major accomplishment. As data architects, you now have that new deployment option within Presto’s supported partner ecosystem.
The two major parts to support any cloud platform are compute and storage. For GCP, these are the Google Compute Engine and Google Cloud Storage (GCS). Using the Presto Hive Connector, you can configure Presto to query data in GCS. We now also have a tutorial on how to access GCS data from Presto. Check it out, and please give us your feedback (firstname.lastname@example.org)!
One of the more exciting features of our GCP support is the availability of our Kubernetes offering on Google’s Kubernetes Engine (GKE). This feature allows you to easily deploy and manage a Presto cluster that provides Presto Coordinator High availability, Presto Worker autoscaling, and integration with Google Stackdriver. More about Kubernetes support below. In addition to GKE, we also support customers running Presto on Google’s Compute Engine Virtual Machines. These Virtual Machines can also be deployed in Google’s Managed Instance Group to leverage the auto scaling capabilities.
Finally, we have integrated with Google Cloud Dataproc to provide Presto through Dataproc initialization actions. The initialization consists of a user-fed script, that you run after Dataproc is initialized to install Starburst Presto 312-e on a GCP cluster. Additionally, this script will configure Presto to work with Hive on the cluster. You can find it on GitHub and we were recently mentioned on Google’s Cloud OnAir.
Presto on Kubernetes
Kubernetes reduces the burden and complexity of configuring, deploying, managing, and monitoring containerized applications. We are excited to announce the availability and support of Starburst Presto 312-e on Kubernetes. This is accomplished by providing both a Presto Kubernetes Operator and a Presto Container. This solution makes deploying and using Presto across hybrid and multi cloud environment simpler. Using our solution you’ll be able to run Presto on the major Kubernetes platforms:
- RedHat OpenShift Container Platform
- Google Kubernetes Engine (GKE)
- Azure Kubernetes Service (AKS)
- Amazon Elastic Container Service for Kubernetes (Amazon EKS)
The Presto Kubernetes Operator helps manage several important functions, while the Presto container is used to form a Presto cluster. The Presto Kubernetes Operator provides this functionality:
- Automatically configures the Presto cluster
- Coordinates High Availability using liveliness probes
- Auto scales the Presto Worker through the Horizontal Pod Autoscaler
- Gracefully scales down and decommissions Presto workers
- Supplies monitoring availability through integration with Prometheus
Mission Control on Kubernetes -- You can deploy Presto to Kubernetes in either of the following ways:
- Use kubectl utility and your YAML file describing the configuration.
- Use the Mission Control UI to provide a web based user experience, and hide configuration details.
While Mission Control provides a good user experience to deploy Presto, we understand many of you are comfortable using the kubectl utility. We support both methods for deployment.
We’ll describe our Kubernetes offering in greater technical detail in an accompanying blog.
Data Source Connectivity
In 312, we continue to introduce additional data source connectivity to extend the reach of using Presto as your data source consumption layer.
- We’ve introduced a parallel Teradata connector. This allows you to read data incredibly fast from your Teradata Warehouse. A more technical blog about this connector is forthcoming.
- We’re making available a Beta version MapR connector. This connector lets you read data from MapR FS, including in a secured MapR cluster. If you’d like to try this out, please contact us at email@example.com
- Azure Data Lake Storage (ADLS) Gen2 is a really exciting and interesting cloud storage option from Microsoft. It provides you the scale and cost effectiveness of cloud object storage, along with additional features such as a hierarchical namespace, ACL, and POSIX permissions. In 312e, we’ve added support for reading data from ADLS Gen2. With Microsoft starting to phase out Gen 1 in favor of Gen 2, we recommend that you try out this feature. We acknowledge and thank David Phillips for his help on this effort.
- Apache Phoenix Connector allows you read data from Apache HBase through Apache Phoenix (Community Contributed).
- We’ve added the Google BigQuery Connector in 302e, and I mention this again given our full support for Presto on GCP.
Performance is a key core component of Presto and something we work on every day. These efforts add to a better user experience because queries run faster and save on costs. There are too many performance-related improvements to list in this blog, so here are a few notable ones that we introduce in our 312e release:
- We’ve added statistics support to most of the RDBMS connectors. With this feature, Presto leverages statistics from data sources such as Oracle, SQL Server, Teradata, and PostgreSQL to participate in the Cost Based Optimizer. The result? Executing federated queries is significantly more efficient.
- Optimized Presto ORC and Parquet readers for faster performance (Community Contributed).
- Additional SQL pushdown of the LIMIT clause has been pushed into JDBC-based connectors, reducing data transfer (Community Contributed).
- Support for late materialization. Delaying materialization can enable significant CPU performance making queries faster. Ask Starburst Data directly for more information (firstname.lastname@example.org).
- Improved performance of reading data from S3 (Community Contributed).
To provide the Presto secure enterprise offering, security is always a major focus at Starburst. We continuously watch for issues and fix any potential vulnerabilities. We also continue to extend the security feature capabilities that enable you to run Presto in your secure environment, while meeting compliance regulations.
One such feature we’ve added to most of the RDBMS connectors is user impersonation. This allows the Presto end user to execute RDBMS queries as a local database user. Previously, all queries executed as the service user. This new improvement allows for more transparency about who is executing queries, and can also help with your security compliance regulations.
We continue to add additional SQL syntax for both ease of writing queries and performance. Following are some notable additions from the community that are available in 312e:
- Add LATERAL join support
- Additional Hive compatibility, including procedures to partition metadata in the Hive Metastore with the file system. This is commonly needed when using the Hive table format. Having this function allows you to do more with Presto, rather than having to use the Hive CLI for such functionality.
Thank you for reading about the exciting new release and features in 312. We will have follow up blog posts to dive deeper into the details of some of these items, such as the Kubernetes integration, Teradata connector, and others.
If you would like more detail on any of the content above, please read the Presto 312e release notes.