Using data to make decisions has been a cornerstone of great businesses for generations. You will often read about the impressive, complex technology that companies like Facebook, Google and Netflix employ to collect, process and analyze data. But some of the best practitioners of data-driven decision making are consumer packaged goods (CPG), financial services and manufacturing companies.
If you're tackling data management at your company, you're probably better off following the lead of Ford or Costco than of Netflix. Longtime data practitioners have developed a set of best practices that apply to every company, and can support real-world decision making without requiring a staff of software engineers. There are three main principles for traditional data management and analytics.
1. Agree on an internal system of record
A complex business will inevitably have multiple copies of the same data stored in different systems. For example, sales, marketing and support departments all collect and store customer data, and some of it is redundant.
The key to avoiding an explosion of complexity, due to different "sources of truth" disagreeing, is to designate a single location as the "system of record" for each concept. For example, you might designate Salesforce as the system of record for the definition of a customer. If the data in Zendesk disagrees with Salesforce about whether someone is a customer, Salesforce is right and Zendesk is wrong. By making it clear where "home" is for each concept, you provide clear direction for your team to resolve discrepancies and fix bugs.
2. Consolidate data in a cloud data warehouse
Before you can use data to make decisions, you will need to consolidate that data in a single location. That location should be a SQL data warehouse. SQL data warehouses are not perfect, but they are the Swiss army knife of data. They excel at the most common task you will do: querying structured, tabular data. And, they are good enough at all the less-common workflows that you will be able to solve your daily data problems with a minimum of complexity.
3. Create a solid dimensional schema
For smaller companies with simple data problems, steps 1 and 2 are a sufficient foundation to answer the questions you will ask as you make data-driven decisions. But larger companies will have enough complexity in their data model that the data warehouse will begin to turn into a mess. There is no magic bullet to solve this problem, but the tried-and-true approach is to build a dimensional schema.
A dimensional schema is a simplified view of all your data. Any large, longstanding business will have many systems of record, each of which has many tables. The resulting data warehouse will have an intimidating level of complexity. You will mitigate this problem by designing a dimensional schema that has fewer tables but is still sufficient to support the questions you need to answer.
For example, analysts in several departments might be interested in analyzing your customers for a variety of reasons. Information about customers comes from several different sources and must be merged and cleaned before it is ready to analyze. You don't want multiple analysts to duplicate their work and come up with different representations of a customer. Instead, you want the merging and cleaning to be done once by a core team of data analysts, resulting in a common customer table or view that the rest of the analysts in the company can use as a starting point.
A dimensional schema is in some ways a work of fiction. By simplifying the data model of your systems of record, you are erasing some of the real world complexities of your business. There will be some questions that can no longer be answered by this simplified view. The key to designing a good dimensional schema is to erase as much complexity as possible, while still being able to support the queries you actually need. You should expect your dimensional schema to evolve as you change the questions you need to ask of the data, so it's important to retain the original data, in its original schema, so it will be available in the future.
While high-tech data engineering may be fashionable, 99% of the time it is not necessary. What most companies need is a solid data warehouse, and a leadership team that is ready to change their opinions in the face of new data.
A version of this blog post was originally published on Forbes Tech Council.