What is NoSQL, and what does it bring to the table?
While SQL databases are the engines on which most businesses run, newer NoSQL databases have gained popularity for web-based applications that serve large numbers of user requests. Each type is well-suited for different use cases. What is the difference between SQL and NoSQL, and which is good for what? We’ll get to that, but let’s review and define our terms first.
Chances are you don’t need much of an introduction to SQL databases — more properly called relational databases. They were conceived and named in 1970 by IBM’s Ed Codd, and commercialized later in the decade. Today’s all-star relational database management systems (RDBMS) include Oracle, Microsoft SQL Server, MySQL, PostgreSQL, and IBM Db2.
SQL stands for Structured Query Language, and is the language used to query these databases and add, remove, and modify data. Like relational databases, SQL was created at IBM. Today it’s both an ANSI and ISO standard. No relational database meets every facet of the standard and some provide proprietary extensions, but there’s enough overlap that someone familiar with one product’s SQL dialect should have no trouble adapting to another.
Relational databases organize data into tables of rows and columns, all defined via a schema that maps out how the entities relate to each other, what their data types are, and how they’re indexed. New records are added and retrieved by row. Relational databases promise to maintain the safety of data through the four key criteria of atomicity, consistency, isolation, and durability, or ACID.
SQL databases work well for online transaction processing (OLTP). In the 1980s they gobbled the market share of older hierarchical, network, and inverted list databases because they were conceptually easier to work with and offered good performance on less expensive hardware. They typically ran on a single primary server, with perhaps replicas running on satellite servers to provide redundancy and better performance.
But things changed. The ‘90s saw the rise of network computing, in which multiple hardware boxes could reside on a single company’s network. If they were database servers, their data had to be kept in sync. That became difficult with the rise of the World Wide Web and globally distributed networks, which couldn’t promise 100% availability.
To manage organizations’ data, databases had to be consistent, available, and tolerant of being partitioned off into domains that couldn’t communicate with each other in case of network or hardware failure. Unfortunately, it’s impossible for a distributed data store to offer all three of those capabilities simultaneously.
Computer scientists came up with various ways to cope. To scale horizontally, they developed databases that, in place of relational’s ACID guarantees, would be basically available, soft state, and eventually consistent, which they called BASE. These nonrelational or NoSQL databases rely on event-driven architecture to scale transactional databases across a global network.
Today’s list of top databases in the DB-Engines Ranking skews heavily toward relational databases, but includes NoSQL databases as well.
NoSQL databases tend to be faster to develop than relational databases, so they’re great for prototyping, where you might change your data structures during the development process, and you don’t want to rely on a database administrator (DBA) to change a schema for you. They’re a natural fit for applications that run across geographically distributed data centers and that require high availability, and they often underpin mobile applications that handle big data and require real-time response rates.
There are four main types of NoSQL databases, all suited to specific use cases.
To quote Amazon on AWS NoSQL, “The document model works well with use cases such as catalogs, user profiles, and content management systems where each document is unique and evolves over time.” Log storage is another good use case. MongoDB, Couchbase, and Firebase are among the top document-oriented databases.
With Alooma, schema changes required back-end maintenance, but Fivetran connectors automatically adapt to schema and API changes.Learn more
Among all the NoSQL databases, perhaps the biggest success story is MongoDB. As a document database, it handles data using JSON, so it can support applications that use both unstructured and structured data. That offers flexibility for integrating data into analytics. And because NoSQL databases are scalable, MongoDB NoSQL is a good tool for early-stage startups that need a low barrier to entry but may experience sudden growth in customers and transactions.
MongoDB may not be the best choice for a pure OLTP environment that demands ACID compliance (although MongoDB does now support multidocument ACID transactions, in which small collections of data can be stored together in a single document and processed with atomic document updates), but it offers a lot of flexibility.
Key-value stores are designed for data storage and retrieval, but instead of relying on defined schemas the way relational databases do, they associate a value (which can be data or a document or an image) with a key. Key-value databases are highly partitionable and scale well horizontally. They’re suitable for, among other things, storing data from the internet of things (IoT), where devices generate many different types of data that would be hard to manage in a highly structured relational database. Redis, memcached, and Amazon DynamoDB are among the most popular key-value databases.
Graph databases are essentially collections of relationships. Each node (which may represent a person, place, thing, or other piece of data) has a unique identifier, a set of relationships with other nodes (called edges), and characteristics represented as key-value pairs. With this general-purpose structure you can model a variety of scenarios that are primarily defined by relationships. Graph databases have names like Graphbase, Dgraph, and InfoGrid.
Wide-column stores, also known as extensive record stores, use rows and columns like relational databases, but unlike RDBMSes, data is stored by column, not by row. That makes them well-suited for online analytical processing (OLAP) systems that retrieve large amounts of data, and less suited to transactional systems that add and modify rows. Like other NoSQL databases, they can store large amounts of structured data without using schemas. Popular examples include Apache Cassandra, Apache Hbase, and Microsoft Azure Table.
Along with these four types, you can find half a dozen other NoSQL options, including object databases and multivalue databases, all of which serve to meet particular niche uses.
Meeting needs is, in fact, what NoSQL is all about. If you take away only one point from this post, it’s that organizations have a variety of software tools at their disposal that can meet specific needs. If you’re designing a new system, it’s worth researching all of your options to find the kind of software and the specific application that will work best for your use case.
As you can tell, NoSQL vs SQL isn’t a zero-sum decision. The two kinds of databases happily coexist, and it’s less a question of whether but when to use NoSQL vs. SQL. Chances are you have places where you could use both in your organization.
And chances are you’re going to want to run analytics using data from both kinds of sources. That’s where Fivetran comes in. In this post we’ve hyperlinked several SQL and NoSQL databases for which we offer connectors. No matter what kind of data you want to replicate, we have you covered. Sign up today and see how Fivetran can facilitate your data analytics.