Turbocharge your BI and Analytics Using Superset and Presto

Ahana Cloud for Presto is a SaaS Managed Service which speeds up the performance of leading business intelligence, data visualization, and SQL tools like Superset.

Interactive, ad hoc queries and lightning fast data visualizations for Superset

What is Apache Superset? Apache Superset is a big data exploration and visualization platform that was created at an Airbnb hackathon in 2015. Since then, it has grown into an enterprise-level, open-source business intelligence (BI) system offering features found in commercial solutions like Looker and Tableau. It is a highly customizable BI system, able to visualize and explore huge amounts of data and is meant for users of all skills. Some of the companies using Superset include Ubisoft, Dropbox, Yahoo, and American Express.

Using Superset, one can use a GUI-based approach to create visualizations and perform data exploration or SQL, allowing ease of use for both casual and technical users. It allows for near real-time data analysis of huge amounts of data and supports geospatial data analytics. Superset can connect to many different SQL-based databases to enable easy creation of charts, interactive dashboards, geospatial visualizations, and data exploration using SQL or drag-and-drop among other features. Some of the databases supported include PostgreSQL, Oracle, Redshift, BigQuery, DB2, and Presto (not technically a database).

Anatomy of Apache Superset

Superset consists of a python-based backend and a javascript frontend. The backend handles connecting to different databases and carrying out data processing while the frontend handles data visualization and user interactions. The backend uses different connectors and SQLAlchemy to connect to different databases, Pandas for data analysis, and Flask for the web server. Optional visualization caching  is supported using software like Memcached and Redis.  A React/Redux architecture is utilized to create the frontend.

While Redash can use direct connections to individual supported data stores. Another way is to connect Redash to a distributed query engine like Presto, to enable higher performance, higher concurrent workloads and instant, seamless access to multiple data sources. With Redash + Presto, a distributed SQL query engine, it can become an even more powerful tool.

What is Presto?

Just like Superset, Presto was created for developing big data solutions. Presto can connect to more than just SQL-based data sources and is customized for distributed and parallel data querying. It then exposes these data sources, even S3 based data lakes as if they were SQL compliant. Using Superset with Presto enables the creation of a highly decoupled and scalable BI solution.

Presto is an open source SQL query engine for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data. Data platform teams are increasingly using Presto as the de facto SQL query engine to run analytics across data sources in-place. This means that Presto can query data where it is stored, without needing to move data into a separate analytics system. 

Faster Queries and Unified Access to more Data with Open Source Superset and Presto

+

Presto is one of the data sources that Superset supports. By combining them, developers and organizations can leverage the features of two open source, distributed, and highly scalable systems to meet their BI intelligence needs. A Superset cluster can be used to carry out data visualization and exploration while a Presto cluster can be used to connect to disparate data sources. This allows BI to be carried out using non-SQL data sources like HDFS and NoSQL databases.

Connecting them is usually done using pyhive, given that the default Presto connector is the hive connector. This is easily done using the command pip install pyhive. One then provides the URL to the Presto data source using the format hive://hive@host:port/database. Custom BI solutions can be created using SQL that is passed to Presto, which handles the actual query processing. Query results are passed to Superset for data analysis and visualization.

If you want to get started with Presto, you can check out Ahana Cloud. It’s SaaS for Presto and offers a fully managed service for Presto in the cloud. Ahana Cloud comes pre-integrated with Apache Superset.

With a few clicks you can add Superset to your Presto cluster.