It’s been a very busy quarter for us here at Starburst. We have doubled in size. Our engineering team has been heads-down to their keyboards. Now they have finally come up for air. The result is one of our most robust releases ever! We are constantly adding features and functionality to our Starburst Enterprise Presto distribution. For our 3rd release in 2019, we have many new additions, from new connectors to increased security to new certifications. Let’s dive right into it.
Here is a list of new features in our Starburst 323e release:
- Power BI DirectQuery connector
- Snowflake parallel connector
- MapR connector
- IBM DB2 connector
- Teradata parallel direct connector
- RedHat certification for Openshift
- Encryption for security sensitive configurations, e.g. passwords in catalogs
- LDAP support for built-in system access control
- Hadoop 3 support, including Hive ORC ACID support
Let’s talk about some of these new features, and we will follow up with more details in a follow up post.
Power BI - DirectQuery Connector
Microsoft Power BI is an increasingly popular, cloud-based business intelligence and analytics service. Until recently Presto supported Power BI via the import mode only. With the new DirectQuery connector, companies can use the power of Presto to query and federate data from many different systems without bringing data back to the Power BI client first.
Presto enables a tremendous performance boost for Power BI users, since large data sets and queries can be handled by Presto directly, rather than returning massive datasets to the Power BI client. This allows virtually limitless querying and exploring of data within an organization, beyond what is possible with Power BI alone.
The diagram below illustrates the Import feature in Power BI. In this mode, data is loaded into the Power BI cache. This works reasonably well for single data sources below 1GB. If you want to explore data sets over 1GB or multiple data sets, using the import mode does not perform. Working with Power BI and modern big data sets can be challenging.
Using the power of Starburst Enterprise Presto and our new DirectQuery connector, data sources appear to end users as just standard databases/schemas. You can explore unlimited data from multiple data sources as illustrated in the diagram below. This allows you to generate reports and dashboards across multiple data sets regardless of their location or type.
We are working with Microsoft to have our connector included as a standard connector. Please contact us for more information on how to include Presto in your Power BI installation.
Snowflake Parallel Connector
Snowflake is one of the fastest growing cloud data warehouses. Many companies have migrated their on-premises legacy warehouses to the cloud. There are often cases where this data needs to be joined with other data that lives in places like object stores, relational databases and NoSQL sources.
In response, Starburst has created a robust connector to Snowflake. Since Snowflake is an Massively Parallel Processing (MPP) database system, we created two different methods to connect. One is using a standard connector to use for smaller result sets, and a distributed connector to use for high volumes of data. This provides greater flexibility for access Snowflake depending up the use case that is needed. In both cases, Presto pushes down most predicates and column projections into Snowflake to minimize the amount of data sent over the network.
The diagram below illustrates how the distributed connector works. The query is executed on Snowflake in parallel and data is sent back to Presto in a parallel fashion.
We’re excited about offering our customers direct connectivity to Snowflake using two different methods. This ensures that whatever the use case is, Starburst Enterprise Presto offers a method to access this quickly growing cloud data warehouse platform.
Learn more from our Starburst Presto Snowflake connector documentation.
IBM DB2 Connector
The connector is ready for production use cases and is used by numerous of our enterprise customers.The IBM DB2 database has been a staple in the database world for many years. Many enterprises still rely on it for critical business functions. Starburst has built a connector for DB2 with features such as user impersonation and statistics support. User impersonation involves logging into DB2 using a service account then “switching” over to the Starburst Enterprise Presto user that matches the equivalent DB2 user ensuring maximum security.
Starburst is proud to announce generally availability of our MapR connector. This connector allows companies to query data directly from MapR HDFS storage. Numerous customers asked us for this connector, mostly for access to their existing data in MapR and for future migration efforts.
The connector is similar to the Hive connector and supports secure and non-secure clusters, user impersonation and SASL authentication support.
RedHat Certification for Openshift
Starburst is proud to announce our Kubernetes offering has been certified on RedHat’s Openshift platform. Using Kubernetes Starburst Enterprise Presto can be deployed almost anywhere, which includes on-premises deployments.
This allows fast, interactive query performance across a wide variety of data sources, including HDFS, Ceph, MySQL, SQL Server, PostgreSQL, Cassandra, MongoDB, Kafka, and Teradata, and more.
With this certification, Starburst Enterprise Presto can be deployed on any OpenShift platform regardless of the location. This allows you to future-proof your architecture as infrastructure technology changes in the coming years.
The Starburst Presto Kubernetes offering can be found on RedHat’s Ecosystem directory at:
Starburst Enterprise Presto’s configuration files can include sensitive information, such as passwords and usernames to data sources. In some organizations, this violates security policies, so we created a method to encrypt this data using Starburst Secrets. Secrets allow administrators to separate configuration files from this sensitive data by storing them in a Java keystore file. This allows the configuration files to only contain encrypted values of this sensitive data.
As security becomes a priority in all organizations across the world, Starburst completes a fully secure Presto ecosystem with Starburst Secrets.
Find out how to use Starburst Secrets from our documentation.
LDAP Support for Built-in System Access Control
Starburst Enterprise Presto has numerous options for controlling access to all of the different connectors and built-in system access control is one of them. This feature allows controlling access to catalogs, sessions and schemas in an easy-to-use way. Improving on this functionality, we’ve added LDAP authorization for users and groups. This allows using a file-based approach to managing access to Presto objects vs. something like Apache Ranger.
More information on this feature can be found in our documentation.
Hadoop 3 Support
Hadoop version 3, with a large number of new features and improvements was released earlier this year. We added full support for this version, most notably the ability to read Hive transactional tables known as ACID.
Other new features for the connector include:
- Read and write for HDFS locations utilizing Erasure Coding
- Bucketing-unaware read support for Hive tables using new Hive bucketing ("bucketing v2")
- Compatibility with TIMESTAMP serialization format used by Hive in file readers and writers for ORC, TEXTFILE, non-binary RCFile
- Compatibility with the new Ranger 1.2.0, while maintaining compatibility with previous Ranger version 0.7.0
- Compatibility with Hive Metastore having "information_schema" and "sys"
- Support for Hive materialized views
Presto High Availability for On-Premises Deployments
As more and more companies rely on Starburst Enterprise Presto to provide mission critical analytical and reporting workloads, our on-premises customers have asked us for a robust, high availability (HA) solution. With this new release, Starburst now offers our customers a HA solution that can be deployed using standard hardware.
The architecture involves an active Starburst Enterprise Presto coordinator and a standby one as illustrated below. Using a virtual IP address (VIP), workers communicate with the active coordinator and change over to the standby one in the event of a hardware failure, simply due to a load balancer routing to the standby instance, now as active instance.
This solution provides provides Starburst customers the peace of mind of a highly available Starburst Enterprise Presto cluster, without the need to install expensive 3rd party software or hardware. Contact us for more information.
Teradata Parallel Direct Connector
Our customers are constantly pushing us to add new features and improve performance to Starburst Enterprise Presto connectors - such as our new parallel connector for Teradata. This was driven by a large telecommunication company that has terabytes of data in Teradata and petabytes of data in Hadoop. Instead of constantly moving data to analyze it, they wanted to keep the data where it lies and query it in-place.
This was difficult using our standard single-threaded Teradata connector. We developed a fully parallelized version that connects from the Starburst Enterprise Presto worker directly to the Teradata “AMP”. The diagram below illustrates how these connections are made: