What is Apache Ranger?
Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the data platform. It is an open-source authorization solution that provides access control and audit capabilities for big data platforms through centralized security administration.
Its open data governance model and plugin architecture enabled the extension of access control to other projects beyond the Hadoop ecosystem, and the platform is widely accepted among “major cloud vendors like AWS, Azure, GCP”.
With the help of the Apache Ranger console, admins can easily manage centralized, fine-grained access control policies, including file, folder, database, table and column-level policies across all clusters. These policies can be defined at user level, role level or group level.
Apache Service Integration
Apache Ranger uses plugin architecture in order to allow other services to integrate seamlessly with authorization controls.
Figure: Simple sequence diagram showing how the Apache Ranger plugin enforces authorization policies with Presto Server.
AR also supports centralized auditing of user access and administrative actions for comprehensive visibility of sensitive data usage through a centralized audit store that tracks all the access requests in real time and supports multiple audit stores including Elasticsearch and Solr.
Many companies are today looking to leverage the Open Data Lake Analytics stack, which is the open and flexible alternative to the data warehouse. In this stack, you have flexibility when it comes to your storage, compute, and security to get SQL on your data lake. With Ahana Cloud, the stack includes AWS S3, Presto, and in this case our AR integration.
Ahana Cloud for Presto and Apache Ranger
Ahana-managed Presto clusters can take advantage of Ranger Integration to enforce access control policies defined in Apache. Ahana Cloud for Presto enables you to get up and running with the Open Data Lake Analytics stack in 30 minutes. It’s SaaS for Presto and takes away all the complexities of tuning, management and more. Check out our on-demand webinar where we share how you can build an Open Data Lake Analytics stack we hosted with Dzone.
Drill is an open source SQL query engine which began life as a paper “Dremel: Interactive Analysis of Web-Scale Datasets”. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Trino is an apache 2.0 licensed, distributed SQL query engine, which was forked from the original Presto project whose Github repo was called PrestoDB