Contents
What is an enterprise data warehouse?
Enterprise data warehouse vs. data warehouse
Types of enterprise data warehouses
Enterprise data warehouse schemas
Enterprise data warehouse architecture
Benefits of an enterprise data warehouse
How to evaluate enterprise data warehouses
Shift from batch to real-time data warehousing
To be competitive, modern companies must be agile and make smart business decisions based on data, not hunches. Unfortunately, important business data is often housed in different departments and managed by disparate teams. This results in siloed thinking and prevents leaders from gaining a holistic view of the business.
Here’s what the disparity might look like: The sales team manages data related to the CRM, sales conversion and more. Support holds the keys to customer success metrics. Procurement knows the facts on supply chain management and so on.
The ultimate goal is to attain a single source of truth. For this, you need a solution that enables data-driven decisions and a quicker time to insight. Plus, it must integrate data from multiple sources for centralized analysis and business intelligence (BI). This is where an enterprise data warehouse (EDW) comes in.
What is an enterprise data warehouse?
A data warehouse is defined as a centralized data repository, sometimes called a database of databases, for reporting and analytical purposes.
An enterprise data warehouse (EDW) is a database of databases that houses data from all areas of a business.
EDWs store data from multiple departments, sources and applications to make centralized analytics available across an enterprise. This data typically comes from different systems, including on-premises sources such as production applications and physical records, cloud sources such as customer relationship management (CRM), enterprise resource planning (ERP) and other web-based applications.
The data housed within an EDW is one of a business’s most important assets. It contains critical information that captures a view of the entire business — one that does not exist anywhere else.
There are four components of an EDW:
- Load manager: extracts information from data sources and loads it into the data warehouse.
- Warehouse manager: performs operations that help manage data, such as data analysis, merging, the creation of views and indexes, etc.
- Query manager: performs operations associated with managing user queries. It schedules query execution and routes queries to their corresponding tables in the data warehouse.
- End-user tools: These are the tools that help users interact with an enterprise data warehouse. These can be OLAP, data mining, ESI, application development, query and data reporting tools.
Enterprise data warehouse vs. data warehouse
The difference between an enterprise data warehouse and a data warehouse lies in its size and scope. A normal data warehouse is designed to support a single department or team in an organization. As a result, it can cause information silos and make it hard for different departments to access and share information.
An enterprise data warehouse is designed to provide decision support services to all departments within an organization. It includes the development of a unified approach to organizing and representing data across departments, processes and teams. This helps departments share data easily, which can improve decision-making.
Types of enterprise data warehouses
There are three types of enterprise data warehouses.
On-premises data warehouse
An on-premises data warehouse is one where an organization is responsible for buying, deploying and maintaining all the software and hardware. This type of warehouse is beneficial for multiple reasons:
- It offers greater control to organizations, which can help with data security and privacy.
- There’s minimal latency due to the on-prem design.
Cloud data warehouse
A cloud data warehouse is a service that a vendor provides to businesses so they can meet their data warehousing requirements. This is done by paying to rent cloud resources from a vendor that has their own infrastructure of hardware and software to help you access a data warehouse online.
A cloud data warehouse is advantageous for the following reasons:
- You don’t have to invest a large amount of capital to buy and set up the entire data warehouse infrastructure. Your vendor’s team is constantly monitoring and upgrading their hardware.
- You can scale storage and compute capability of your warehouse up or down as your requirements change and your business grows.
Hybrid data warehouse
A hybrid data warehouse is one in which the organization retains control of some parts of the data warehouse components and outsources the other parts. This data warehouse is good for the following reasons:
- It allows you to keep control of sensitive data that shouldn’t be moved to the cloud.
- It’s easier to adopt a hybrid data warehouse because the organization doesn’t have to move all data to the cloud at once.
Enterprise data warehouse schemas
A schema in an EDW defines the organization of database entities (e.g., dimension tables, fact tables) and their logical association.
Star schema
In a star schema, there is one fact table in the middle that is surrounded by associated dimension tables. A fact table stores primary business information in the form of aggregated facts. For example, you can have a fact table with the name SALES that includes business information in the form of properties, such as product id or customer id.
Conversely, dimension tables store traits about the data in fact tables. For instance, a product dimension table can store information about the product attributes, such as category and items sold.
A star schema is good for data retrieval in reporting. It’s easy to use joins in queries in a star schema, which can improve query performance.
Snowflake schema
A snowflake schema borrows its basic design from the star schema with a slight difference: dimension tables in snowflake schemas can be associated with other dimension tables that aren’t linked to the fact table.
A snowflake schema improves data integrity by providing structured data. It also helps to consume less disk space for the same reason.
Galaxy schema
The galaxy schema contains multiple fact tables that are surrounded by multiple dimension tables. Two fact tables can share the same dimension table. A galaxy schema minimizes redundancy significantly since it involves more normalization than other schemas.
Enterprise data warehouse architecture
Historically, one- and two-tier architectures were used for data warehouses. Today, the three-tier architecture is the most popular approach for enterprise data warehouses. Here’s what it entails.
Bottom tier
The bottom tier consists of the data repository, which stores data in a relational database or multidimensional database. This data is collected from disparate sources. Before this data is stored, certain actions are performed on it as part of the extract transform load (ETL) process. This includes transforming data, such as cleaning it, deleting duplicates, changing data types and more.
Middle tier
This layer contains an online analytical processing (OLAP) server that works as a discovery and analysis system for business analysts. Charts, reports, and predictions are processed in this stage. There are two types of OLAP systems:
- Relational online analytical processing (ROLAP): Works with relational databases.
- Multidimensional online analytical processing (MOLAP): Works with multidimensional databases.
Top tier
This stage contains the front-end layer that acts as the user interface. It can be manipulated by command line or by a GUI (graphical user interface).
Benefits of an enterprise data warehouse
Without a centralized data warehouse, departments often work in data silos. Teams independently pull in and analyze data from various sources. However, this is an error-prone, unrepeatable approach as the analysis is subject to the biases of the particular data importer.
Instead, enterprises need a data storage solution that enables repeatable and accurate reporting within departments and across the organization for a 360-degree view. A single source of truth supports the following:
Immediate and easier access to actionable data
Data analysts (and the business teams they support) can quickly build insights from the full dataset. Since the EDW serves as a centralized repository, users can fetch and view all organizational data from a single interface instead of going through different systems for different information. This reduces the time spent on retrieving data, allowing business users to spend more time on analysis and use data-driven insights to improve decision-making.
Multi-department collaboration
EDWs and the company-wide data housed in them provide a big-picture view of the questions facing the enterprise. For example, in an ecommerce company, marketing might track how ads on social and earned media are driving website traffic upward.
Department-driven metrics have their place, but a bigger impact comes from joining data between departments. For instance, if sales and marketing share an EDW, both functions could fully track the customer’s online path to purchase and work together to optimize the buyer journey — from discovery to conversion.
In this case and countless others, EDWs provide a broader context and demonstrate the relationships between individual data points. This knowledge offers a better understanding of what the information means and how it can be put to use.
Consolidated and standardized data
EDWs are unique in that they store massive datasets from all over a business’s operations. When effective data transformation is applied, the data can be used for accurate comparison across the enterprise.
The standardized data store can also help make sense of seemingly random pieces of data flowing into the organization, saving valuable time by automatically and systematically aggregating the information.
Empowerment of less technical team members
When an EDW is combined with a data-driven culture, insights start to flow. Data professionals and decision-makers collaborate to streamline processes and impact revenue across the organization. An EDW benefits non-technical team members in all departments, including marketing, finance, HR, procurement and more. In these cases, teams can identify new, wider-ranging KPIs and gauge results, allowing key personnel to plan accordingly.
Adherence to compliance
EDWs help organizations audit and check data sources with a single source of truth, allowing them to find errors swiftly. This can help them comply with regulations like GDPR and CCPA. It can be especially effective for companies in the healthcare and finance space, where industry-based regulations can make it hard to monitor and track data.
How to evaluate enterprise data warehouses
Along with the main providers mentioned, there’s a slew of vendors offering centralized data storage in a cloud data warehouse, so starting your search might seem daunting. While each company has specific needs, here are some key selection criteria that can help guide your decision.
Select a warehouse that complements your ecosystem
A cloud data warehouse should align with your business model and fit with existing systems. As mentioned, Snowflake, AWS and Google Cloud all offer outstanding data warehouse options, as do Microsoft Azure and Databricks. Consider the rest of your infrastructure and existing data tool ecosystem to make certain your company’s data types and existing ecosystem are congruent with your EDW choice.
For example, conduct a compatibility assessment to confirm your cloud data warehouse provider will pair well with your data transformation, BI and data integration/ETL tools of choice.
Compare costs
Cloud data warehouse providers have different ways of calculating costs for compute and storage. Depending on your usage model, these cost configurations can have a drastic impact on your costs from month to month.
Perform a usage audit to check your model against your vendor choice to ensure costs align with your budget. To help you evaluate the costs of migrating platforms, see our post on data egress costs.
Also, be sure to consider the labor costs associated with data migration and the adoption of your new cloud data warehouse. Does your team have the internal resources to adopt the new data stack? Or will you need to engage with an external consultant? These questions are important to factor into your data warehouse decision.
Factor in the cost of scaling
As your company becomes more data mature, your data capacity and query volume and complexity will only rise. On top of the current data warehouse usage audit, build a projection of how these needs will grow over time. This exercise will help ensure you don’t spend on unneeded capacity. Armed with this information, ask your prospective vendors how the costs will change as your needs expand and flex.
Don’t forget security
As your business data usage scales, so does the number of data sources. Security concerns for transferring and storing all of this data increase as well.
Choose a cloud data warehouse that has features such as locking schemas, monitoring utilities, remote maintenance capabilities and similar functionality as baseline offerings.
Depending on your company’s use case, additional security components to shop around for include:
- Strong user authentication and authorization that will inhibit unauthorized access
- Data encryption
- Data erasure, which protects data with a complete override on physical storage to prevent malicious recovery
- Protection of data attributes via data shielding
Consider user access
In both regulated and unregulated organizations, ensuring proper data access is very important. When evaluating vendors, check for features such as read-only permissions, custom user groups with locked-down permissions, encrypted columns, and tools to enable consistent account auditing and clean-up. These access control processes vary widely among cloud data warehouse vendors.
Evaluate fault tolerance
Natural and manmade disasters are unavoidable, so your cloud data warehouse must have adequate fault tolerance. Ask questions, such as does your prospective cloud data warehouse provide adequate high availability and durability in the event of a catastrophic failure at a data center?
Specifically, find out if they provide proper backup as well as fault-tolerant and redundant hardware. These features and equipment will protect your data stack from loss due to power failures and potential equipment malfunctions.
Trust your peers
One last tip: Contact peers in your field — especially those who have similar data usage models as your company’s. Ask them their preferences on availability, ease of use and cost, and how they made their cloud data warehouse decision.
Shift from batch to real-time data warehousing
Traditionally, data in EDWs is loaded from source systems in batches hourly, daily or weekly. This approach worked for a long time until the expectations of customers changed. Now, end users want insights in real time, whether it’s a customer looking to get updates about the order they placed online or a sales employee who wants to identify any trend in the sales data from the last 15 minutes. That’s why companies are now moving toward real-time data warehousing.
Real-time data warehousing loads data continuously into the data warehouse and makes data-driven insights immediately accessible to end users. This ensures that users can get the latest information and make decisions accordingly. Learn more about it here. To experience the benefits of Fivetran for yourself, consider a free trial.