In the data analytics world, OLTP systems generate source data and OLAP systems analyze it.
The term “online transaction processing” (OLTP) was coined back in the days before everything was computerized — thus the need to specify “online.” The “processing” part of the name, which seems kind of unnecessary for a computing system, was probably borrowed from “data processing,” an old term for information technology. The key part of the OLTP acronym is clearly the "T."
An OLTP system handles daily business: making sales, registering users, updating shipping addresses. A typical example is a retail shop — the business of selling items is literally transactional. Transactional behaviors are those that:
Involve small but frequent units of work
Require low latency and immediate syncing. If an item is removed from an online shopping cart, that removal must be recorded immediately. Efficient processing for timely updates is critical for OLTP systems.
Revolve primarily around individual and unique elements of data — a user record is added, an address is updated, an expired item is deleted
Involve adding, modifying and deleting data with equal or similar importance to reading it
Are concerned primarily with recording the current state of a subject
You might notice that all of this sounds like almost any modern web application. Indeed, any system that has a login probably encompasses an OLTP system.
Take this blog, for example. Authors can create, remove and edit articles. Analytics can show us whether an individual has visited a particular post, or how many pageviews a post has garnered. All of those operations involve transactions, and the systems that manage them are OLTP systems.
People who use OLTP systems are primarily concerned about the current state of their data. Where does our client live today? How much money is in that bank account right now?
The term OLTP refers to entire systems that perform a certain kind of work. In the case of OLTP, most of the “work” of applying business rules is handled and curated by an application that almost certainly uses database types that are designed for transactional performance (Postgres, MySQL, MongoDB, DynamoDB, Redis). While the application does the “processing,” the database provides storage for the system — but OLTP systems are application-centric.
Even back in the early days of computing, people wanted to draw analytics from their online data, and wrote systems to help them do that. As with everything in computing, these systems got more sophisticated until they became a whole computing category.
Ed Codd, the creator of the first relational database, dubbed this category “online analytical processing” (OLAP) in 1993. The abbreviation provided a catchy contrast with OLTP systems — though OLAP software had been in use for at least 20 years already under the name “decision support systems.”
OLAP systems focus on aggregating data to provide analysis. In OLAP systems:
Use cases, which are primarily reports and dashboards, return summaries across large volumes of data points by means of aggregation.
Data stores are optimized for reading data. An OLAP system can continuously provide reporting on its dataset, but loads new and updated data less frequently — anywhere from daily to hourly is common, depending on the organization’s reporting needs.
While OLTP systems update one or a few rows of data at a time, OLAP systems process queries spanning thousands or millions of rows at a time.
Modern OLAP systems use distributed computing so they can scale to handle complex analytical processing across large datasets — the buzzword used to be “big data.”
In the case of an OLTP, most of the processing is done within the application code and the database takes a secondary role.
By contrast, in the world of OLAP, a database system stands at the center. This kind of database has a special name — a data warehouse. In contrast to OLTP, which, as we said, is application-centric, OLAP is database-centric.
Databases used in OLTP systems are optimized for row-oriented operations, because those operations make up the bulk of use cases. By contrast, data warehouses used in OLAP systems are column-oriented, because data analysts use the systems to retrieve aggregated data from individual columns rather than rows.
Examples of OLAP systems include columnar data stores like Amazon Redshift, Snowflake and Google BigQuery, and distributed systems such as Hadoop and Spark.
While OLTP systems are great at keeping track of the current state of data, OLAP systems are concerned with historical data. They can produce insights into changes in data over time or across large categories because they look at large sets of records. Queries in OLAP systems, which are usually written in a common query language called SQL, can be quite complex, taking in data from multiple databases and tables.
In the past, OLAP systems ran on specialized server hardware that was optimized for query performance. With the advent of cloud computing, however, which can scale quickly to handle almost any load, virtually all new data warehouses implementations now run on cloud platforms.
OLAP systems comprise more than a data warehouse, however. They use a data pipeline like Fivetran to extract data from multiple sources and load it into the data warehouse, transformation tools like dbt to manipulate the data, and analytics tools like Tableau, Looker, Microsoft Power BI, and dozens of others to build the deliverables — the reports and dashboards that provide managers with insights from their data.
NB: People have also used OLAP to refer to offline analytical processing, in which analytics can be performed on local machines using tools like Microsoft Excel — which, we hasten to point out, is not a database. Offline processing might have made sense for small amounts of infrequently updated data in the days before networking was common, but in the context of today’s data analytics world, it’s a nonstarter.
Most database administrators and data engineers learn about transactional databases before they’re ever exposed to the concept of OLAP. Because we humans tend to stick with what we know, these data professionals often first think about transactional databases for analytics. That’s a beginner’s mistake. Row-oriented databases aren’t architected for efficient retrieval of columnar data.
For best results, don’t use a transactional database for analytics — use a cloud data warehouse with a columnar data store.
OLTP and OLAP systems are designed for different purposes. If your system performs frequent adds, updates, and deletes, it’s OLTP and you should use a row-oriented database. If you’re reporting on or summarizing large datasets, that’s OLAP, and you should use a column-oriented data warehouse.