Turbocharge your Analytics with MongoDB And Presto

High-Level View Of MongoDB

mongo logo

MongoDB is a NoSQL distributed document database meant to handle diverse data management requirements. Its design goals include creating an object-oriented, highly available, scalable, efficient, and ACID (Atomicity, Consistency, Isolation, and Durability) featuring database. Its document model enables data to be stored in its most natural form as opposed to the relational model, making users more productive. It supports both schemaless and schema design, offering both flexibility as well as data integrity and consistency enforcement as needed. Some of the organizations using MongoDB include Google, SAP, Verizon, Intuit, Sega, Adobe, InVision, and EA Sports.

A Look At MongoDB Architecture

MongoDB stores data in documents in the Binary JSON (BSON) format. Logically related documents are grouped into collections that are indexed. Mongodb servers that store data form shards are grouped into replica sets. Replica sets have the same data replicated among them, with the default replication factor being 3 servers. Data is partitioned into chunks, which combined with sharding and replication provides high reliability and availability. During partitioning, consistency is ensured by having the database write unavailable. Config servers have configuration data and metadata related to the MongoDB clusters. Mongo’s Routers accept queries and return results to clients and are responsible for directing queries to the correct shards. 

MongoDB Deployment

MongoDB is cross-platform and can be installed on all major operating systems. It can either be installed manually, deployed on private and/or public clouds, or accessed via premium cloud offerings. Recommended practice in production is to have multiple nodes running MongoDB instances, forming a cluster.

What is Presto?

logo presto

Presto is an open source SQL query engine that provides a scalable and high throughput query engine capable of accessing different data stores including MySQL, DB2, Oracle, Cassandra, Redis, S3, and MongoDB. This enables the creation of a virtualized data lake of all data. Combining Presto with MongoDB creates a highly scalable and cohesive yet loosely decoupled data management stack.

Scalable Analytics With MongoDB and Presto

Screen Shot 2021 05 12 at 10.26.47 AM

Combining MongoDB and Presto provides a highly scalable tech stack for developing distributed analytical applications. MongoDB is an enterprise distributed database capable of storing data as strictly as users need it to be and ensure high horizontal scalability, availability, resilience, and self-healing. Designers and developers can choose the data model that best serves them, trading flexibility for strictness in the schema design and performance for transactional integrity in write operations. Different clusters can be created as needed to meet different goals as per performance and functional needs.

For example, writes can be unacknowledged, acknowledged, or replica-acknowledged, with faster writes being achieved with weaker write enforcement. Reads can be performed from secondary, primary-preferred, and primary nodes for a tradeoff between turnaround times and fetching stale data. This makes it a great storage layer for OLAP systems. Data can be persisted as accurately or read as fast as possible as per each application’s need. Integration is achieved using the Presto MongoDB connector.

Can you insert a JSON document into MongoDB with Presto?

This question comes up quite a bit. In short, yes you can do this. You’d be running an insert statement from Presto to Mongo. If you use Presto, you’d insert it as a table. For example:

INSERT INTO orders VALUES(1, 'bad', 50.0, current_date);

That insert would go into MongoDB as a JSON document.

Getting started with Presto in the cloud

If you want to get started with Presto quickly in the cloud, try out Ahana Cloud for free. Ahana takes care of the deployment, management, adding/detaching data sources, etc. for you. It’s a managed service for Presto that makes it really easy to get started. You can try it free at https://ahana.io/sign-up