An enterprise data warehouse is critical to the long-term viability of your business.
To be competitive, modern companies must be agile and make smart business decisions based on data, not hunches. Unfortunately, important data is often spread across multiple departments and teams, creating siloed thinking and making it challenging for leaders to get a holistic view of the business.
In an applied example, the sales team manages data related to the CRM, sales conversion and more; support holds the keys to customer success and support metrics; procurement knows the facts on supply chain management; and on and on. Mission-critical data is housed in different departments and managed by disparate teams.
What’s needed is a solution that will enable data-driven decisions and quicker time to insight by integrating important data for centralized analysis and business intelligence.
The ultimate goal is a single source of truth.
The term data warehouse is not new. It is defined as a centralized data repository, sometimes called a database of databases, for reporting and analytical purposes. An enterprise data warehouse (EDW) is a database of databases that houses data from all areas of a business.
EDWs store data from multiple departments, sources and applications to make centralized analytics available across an enterprise. This data typically comes from different systems, including on-premises sources such as production applications and physical records, and cloud sources such as customer relationship management (CRM), enterprise resource planning (ERP) and other web-based applications.
The data housed within an EDW is one of a business’s most important assets, as it contains critical information that captures a view of the entire business that exists nowhere else.
Traditionally, data warehouses were hosted in on-premise data centers, but the advent of cloud computing has enabled “serverless,” cloud-based data warehouses where compute and storage resources can be independently scaled up and down as needed. Modern cloud data warehouses have become extremely accessible to organizations with modest resources. Examples include Databricks, Snowflake, Google BigQuery and AWS Redshift.
Why does your company need an EDW?
In companies that don’t have a centralized data warehouse, departments often work in data silos. Teams independently pull in data from various sources and try to analyze it in Excel. However, this approach is error-prone because of biases of that particular data importer and their bias on how they analyzed said data.
In some cases, we have a subset of data from a data warehouse called a data mart — a subset of data within the data warehouse that is made available to specific departments or teams. Reports and insights gained from these data marts are useful to track and measure progress for department-specific metrics, such as churn rate or volume of support tickets. These are important metrics to track, of course, but they’re limited in scope because the data repositories only store and manage data relevant to the scope and operation of that individual business group.
Business moves fast, and savvy enterprises will desire a data store and solution that enables repeatable and accurate reporting within departments and a view across the organization (such as a 360-degree customer view). In short, we need a single source of truth that enables:
1. Immediate access to actionable data
The data housed in enterprise data warehouses should be up to date, cleansed and enriched by data engineers to be relevant to the company’s needs. This access enables data analysts (and the business teams they support) to build insights off the holistic data set in an agile manner.
2. Multiple departments working toward a common goal
Enterprise data warehouses and the company-wide data housed in them provide a big-picture view of questions facing the enterprise. Let’s look at an example of an ecommerce company that sells goods online.
In this example, marketing might be tracking how ads on social media and earned media are driving website traffic upwards. It’s great that the department is metrics-driven, but what if they could coordinate with sales and CRM systems to track the customer buying journey and see how an ad led to a prospective customer coming to the site, browsing an online catalog and purchasing multiple items. With an EDW in place, sales and marketing can work together to optimize prospect attraction, discovery and ultimately conversion.
In this case and countless others, EDWs provide context and demonstrate the relationships between individual data points. This allows for a better understanding of what the information means, and how it can be put to use.
3. Consolidated and standardized data
Enterprise data warehouses are unique in that they store massive data sets from all over a business’s operations. When effective data transformation is employed, this data can be used for accurate comparison across the enterprise.
The standardized data store can also help to make sense of seemingly random pieces of data that are coming into the organization through various inputs, and it can save valuable time by aggregating that information automatically. Organizations are likely to be better positioned for future growth when their data is organized in such a systematic, automated fashion.
4. Empowering less technical team members
When an EDW is combined with a data-driven culture, insights start to flow as data professionals and decision-makers come together to build company-wide efforts to streamline processes and impact revenue. An EDW benefits non-technical team members in all departments, including marketing, finance, HR, supply chain and more. In these cases, teams can identify new, wider-ranging KPIs and gauge results, allowing key personnel to plan accordingly.
Now that the business need for an EDW has been established, it’s important to discuss the benefits of having a cloud-based EDW versus an on-premise, traditional solution.
Key features that cloud data warehouses offer include:
Speed and scalability
Lower total cost of ownership (TCO)
Cloud elasticity and integration capabilities
Better ability to enable self-service capabilities for business users
The three main providers of cloud data warehousing are: Snowflake, AWS Redshift and Google BigQuery.
Snowflake's platform delivers a fully elastic and highly flexible data warehouse that can collect, store, query and share data sets from a range of disparate sources, from structured data to JSON. With a fully integrated data lake, secure data sharing, data exchange and data application development workloads, Snowflake can easily scale up, down or out as needed to handle the constantly fluctuating data needs of the modern data enterprise, across different departments, business units, geographies and clouds.
BigQuery is similar to Snowflake, except there is no concept of a compute cluster, just a configurable number of "compute slots." BigQuery on demand is a pure serverless model, where the user submits queries one at a time and pays per query.
Redshift is a part of the AWS suite of services that is aimed at development teams. With Redshift, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database and data lake using standard SQL. Redshift lets you easily save the results of your queries back to your S3 data lake using open formats, like Apache Parquet, so that you can do additional analytics from other analytics services like Amazon EMR, Amazon Athena and Amazon SageMaker.
For more detailed information, see our 2020 Cloud Data Warehouse Benchmark.
With many vendors offering centralized data storage in a cloud data warehouse, it might be daunting when you begin your search. While there are specifics that are unique to each company, here are some key selection criteria that can help guide your decision.
1. Select a cloud data warehouse that compliments your ecosystem
It’s important to choose a cloud data warehouse that fits in with your business model and your existing systems. Snowflake, AWS, Google Cloud, Microsoft and Databricks all offer outstanding data warehouse options. Consider the rest of your infrastructure and existing data tool ecosystem to make certain your company’s data types and existing ecosystem is congruent with your enterprise data warehouse choice.
For example, conduct a compatibility assessment to confirm that your cloud data warehouse provider will pair well with your data transformation, business intelligence and data integration/ETL tools of choice.
2. Cost comparison
Cloud data warehouse providers have different ways of calculating cost for compute and storage. Depending on your usage model, these cost configurations can have a drastic impact on your costs from month to month. Perform a usage audit and check your model against your vendor choice to ensure the costs are at a level you’re comfortable with. To assist you in evaluating costs of moving platforms, see our post on data egress costs that you should read and share amongst your data team.
Speaking of cost, be sure to consider the labor cost associated with data migration and adoption of your new cloud data warehouse. Does your team have the resources internally to adopt the new data stack, or will you need to engage with an external consultant? These questions are important to factor into your data warehouse decision.
As your business data usage scales, the number of data sources also trends upwards along with security factors for transferring and storing all this data. Choose a cloud data warehouse that has features such as locking schemas, monitoring utilities, remote maintenance capabilities, and similar functionality as baseline offerings.
Depending on your company’s use case, additional security components to shop around for include:
Strong user authentication and authorization that will inhibit unauthorized access
Data erasure, which protects data with a complete override on physical storage to prevent malicious recovery
Protection of data attributes via data shielding
4. Consider cost of scaling
Another factor to consider as your company becomes more data mature is that data capacity and query volume and complexity will only rise. It’s critical to audit your current data warehouse usage and build a projection of how these needs will grow over time. Armed with this, ask your prospective vendors how cost will be affected as your needs expand and flex. Be smart about your purchasing so you avoid spending money on capacity your team does not need.
5. User access
Whether or not your company operates under regulation from laws such as HIPAA or GDPR, ensuring proper data access is very important. When evaluating vendors, check for features such as slave read-only, custom user groups with locked-down permissions, encrypted columns, and tools to enable consistent account audit and clean up. These access control processes vary widely amongst cloud data warehouse vendors, so ensure your needs are met during your vendor selection process.
6. Fault tolerance
Natural disasters and man-made ones are unavoidable and as a result, your cloud data warehouse must have adequate fault tolerance. Ask questions such as: does your prospective cloud data warehouse provide adequate high availability and durability in the event of a catastrophic failure at a data center? Specifically, ask your vendors if they provide proper backup and fault-tolerant and redundant hardware that will protect your data stack from loss due to power failures and potential equipment malfunctions.
Armed with these features and considerations, cloud data warehouse selection should be less daunting. One last tip is to ask your peers in your field — especially ones who have similar data usage models to your company’s — and ask them their preferences on availability, ease-of-use and cost, and how they made their cloud data warehouse decision.
Fivetran, the leader in automated data integration, delivers ready-to-use connectors that automatically adapt as schemas and APIs change, ensuring consistent, reliable access to data. Fivetran improves the accuracy of data-driven decisions by continuously synchronizing data from source applications to any destination, allowing analysts to work with the freshest possible data.