We’ve recently published an overview of several different kinds of NoSQL databases. This time we’ll dive deeper into one specific example, MongoDB, and see why many organizations have found it a compelling solution.
NoSQL databases emerged about 10 years ago to serve the needs of distributed applications running across servers in multiple locations — the servers that powered ecommerce and other activities on the World Wide Web.
NoSQL followed the Gartner Hype Cycle, initially climbing a wave of buzz to a Peak of Inflated Expectations, then falling to the Trough of Disillusionment. NoSQL databases are now on the Plateau of Productivity. They are mature enough that organizations feel comfortable with the state of various products and are willing to adopt them when the use case makes sense.
When to use NoSQL vs. SQL
NoSQL databases like MongoDB are a good choice when your data is document-centric and doesn’t fit well into the schema of a relational database, when you need to accommodate massive scale, when you are rapidly prototyping, and a few other use cases.
MongoDB is the most popular of the new breed of non-relational NoSQL databases. Specifically, it’s a document database, also called a document-oriented database or a document store.
- Documents hold semistructured data, usually represented in a format like JSON or XML, and each document is associated with a unique key.
- Key values are typically a path or a URI that can be used to retrieve the associated document from the database.
- The keys are indexed, making it efficient to retrieve the associated documents.
Document structures usually align with objects that developers are working with in code, which is a more flexible approach than the row-and-column table-oriented structure of a relational database. Developers can rework document (data) structures as their application requirements change over time. With this approach, data structures become like code — both are under developers’ control.
Document databases are popular in ecommerce and securities trading platforms, among other uses, because they scale out well across multiple servers to support high data volumes and traffic.
Popular document databases include Amazon DynamoDB, Couchbase, and Google Cloud Firestore, but the most widely used is MongoDB, so it’s the one we’ll focus on. Document databases vary in their implementations — let’s see exactly what makes MongoDB tick.
Zooming in on MongoDB
MongoDB stores data records as BSON documents. BSON is a binary representation of JSON documents, though it contains more data types than JSON.
Documents, in turn, are gathered into collections. If you’re already familiar with relational databases, you can think of a collection as equivalent to a table, but without a schema. Unlike the records in a relational table, documents within a collection can have different fields, though typically all documents in a collection have a similar or related purpose. Collections exist within a given database.
Most businesses use MongoDB as a distributed database on multiple, geographically dispersed servers in a configuration called a cluster. Clusters allow a MongoDB database to scale horizontally across many servers with sharding (auto-balancing). They also let applications replicate data across servers to ensure high availability through a feature MongoDB calls replica sets, thus enhancing the overall performance and reliability of a MongoDB cluster.
MongoDB supports multi-document ACID transactions, even across replica sets and sharded clusters. That means if a connection breaks before a transaction is complete, or if any command in the transaction fails, then the database rolls back all of the changes it made in the course of the transaction. ACID compliance is a key benefit to relational databases as well.
To get data into and out of databases, MongoDB uses MongoDB Query Language (MQL). It uses the same syntax as documents, making it easy for developers to work with, but it’s nowhere near as intuitive as SQL, the standard query language for relational databases — and many analysts would take issue with using “intuitive” and “SQL” in the same sentence.
NoSQL use cases
NoSQL databases in general and MongoDB in particular are particularly well suited to certain use cases. If you’re debating the merits of SQL vs. NoSQL databases, see our previous post.
Aceable switches from Alooma to Fivetran, eliminates ETL maintenance
Massive-scale data
NoSQL databases are especially good at handling big data because they’re architected to scale well horizontally across multiple servers. MongoDB’s built-in support for sharding lets developers scale clusters just by adding machines, which in a cloud environment is simple to do with minimal latency. It’s a more cost-effective approach than businesses used to have available, when they had to provision their data centers with enough server storage and CPU resources for the highest use cycles, leaving systems underutilized when loads decreased.
Caching and high availability
MongoDB’s ease of creating replicas — read-only copies of data — makes it a natural for applications that demand high availability. If a primary server goes down, MongoDB can swap in a secondary server to take over as the primary. Meanwhile, data can be cached on servers close to users who need it, minimizing latency for data analysts who want the latest data to create business intelligence reports or data scientists creating machine learning models.
Rapid prototyping
NoSQL databases are a natural fit for businesses that are building out new products. Specs and capabilities often change during the development process, especially at the prototyping stage. When application developers use a relational database with a defined schema, it’s time-consuming to revise data structures and convert data. In a rapid development environment, you may have to do that over and over.
When you use a document database like MongoDB, by contrast, you have no rigid schema, just key-value pairs of documents, which gives you more flexibility.
Streaming feeds
Document databases are an efficient alternative to data warehouses for rapidly changing datasets. A data warehouse uses a columnar relational database in which all the data is structured according to a schema. That makes for a fast repository for analytics, but one that’s extremely inefficient to update, and inflexible when schemas change.
MongoDB has a feature called change streams, which are real-time streams of all changes that occur in the database. If an application inserts, updates, or deletes data in a collection, MongoDB triggers a change event with all the data that’s modified across all replicas. That makes MongoDB a great back end for applications that manage streaming feeds — use cases like new comments to a messaging board, new readings from IoT (internet of things) sensors, or financial securities orders.
Content management and cataloging
Content-based applications are a special case of streaming feeds. Consider a retail product catalog. New products come and go regularly. Product inventories change as units are sold, and prices change too. Depending on the retailer’s needs, developers could create data models in the form of JSON structures to represent the way the company handles inventory and sales, with a flexible, dynamic structure that they can change easily.
Sailing into the future
As you can see, MongoDB is a great alternative for a variety of use cases. It’s mature enough to merit consideration by any organization with a new use case. And its increasing popularity means more organizations have valuable data stored in MongoDB databases.
That’s why Fivetran developed our connector for MongoDB. We can help you unlock the value of that stored data by replicating it to your company’s data warehouse. We’ve documented the whole process, but there’s no substitute for trying it yourself. Sign up today for a free trial.