Why do so many businesses struggle to establish successful analytics programs? A lack of data is not the problem. Data volumes — from hundreds of cloud applications to millions of IoT endpoints — are exploding across organizations and industries. The real challenge is getting access to timely, trusted data on a reliable basis so that data analysts can actually do their job — analyze data!
The problem of volume and variety
Data can originate from hundreds or thousands of sources across an organization, including:
- Digital activity recorded by software triggers, such as clicks on a website or app
- Transactions stored in a finance system
- Report data from disparate advertising platforms
The problem of gaining simple, reliable access to data is twofold.
First, SaaS applications continue to proliferate in organizations of all sizes. Most organizations use a wide range of apps to handle operations such as customer relationship management, billing and customer service. Every one of those apps is a potentially rich source of data.
Second, each one of these SaaS applications features its own unique web of APIs and data models — and those can change at a moment’s notice, or no notice at all.
Taken together, it’s a daunting data integration challenge. Combine an exploding number of applications with the increasing complexity of apps, and it’s no wonder organizations are failing to do data analytics well or realize any business value from data.
Why the traditional ETL process falters
The predominant method of data integration is a process known as extract-transform-load, or ETL, which has been around for decades. The acronym ETL is often used colloquially to describe data integration activities in general. But strictly speaking, ETL evolved at a time when computing power, storage and bandwidth were scarce and expensive. Built in an era with such resource constraints, the cloud era makes this approach increasingly antiquated.
In building an ETL pipeline, data engineers and analysts follow a complex workflow that involves defining database schemas, building a data pipeline, maintaining that data pipeline, and trying to derive insights while managing all of this.
This ends up being a resource-intensive, endless cycle, as each data pipeline runs on custom code designed for specific use cases. The code can become nonfunctional with little warning, leading to broken pipelines. And when the C-suite or line of business demands a new requirement, data engineers are faced with extensive data management challenges and code revisions.
How ETL hinders data analysts
Any organization awash in data yet dependent on ETL will always struggle to access the right information at the right time. And yet, as we mentioned above, ETL remains the industry standard among established organizations. These organizations are shackling themselves to 1970s methodologies while modern, cloud-native businesses pull away.
A November 2021 survey of 300 data engineers at large organizations found that, on average, they spend 44% of their time building and maintaining data pipelines. That wastes each of their companies an estimated half a million dollars a year. It’s no surprise that three quarters of them were understandably frustrated at this waste of their time.
This data is not unique. A June 2020 Dimensional Research survey of nearly 500 data professionals revealed enormous problems. For example, 86% of data analysts said they have to use data that is out of date, and 90% said that numerous data sources were unreliable over the previous year.
The modern approach to data integration
Many modern businesses, including Square, Urban Outfitters and DocuSign, have adopted a different approach to data integration. This modern approach, known as automated ELT or automated data integration, makes data access as simple and reliable as electricity. Data analysts using automated ELT and a modern data stack can make timely, well-informed recommendations with little to no engineering burden.
ELT, or extract-load-transform, shifts the “transform” step to the end of the data pipeline. Now analysts can load data before transforming it, so they don’t have to determine beforehand exactly what insights they want to generate. Instead, the underlying source data is faithfully replicated to a data warehouse and becomes a single source of truth. Analysts can then perform transformations on the data without compromising the integrity of the warehoused data.
Compared to ETL, ELT takes full advantage of modern cloud-based enterprise data warehouses, which are column-oriented and feature architectures that separate compute from storage. Data warehouses are designed to run analytics queries extremely efficiently, and they allow organizations to store massive amounts of data and run queries over those data sets cost-effectively.
Automated ELT leverages prebuilt, zero-configuration data connectors that automatically detect and replicate schema and API changes, and lightly clean and normalize data.
These capabilities require the provider of automated ELT to have a deep knowledge of data sources, extensive data modeling and analytics expertise, and the engineering know-how to build robust software systems.
Fortunately, if you choose the right automated ELT vendor, you don’t need to worry about this.
Properly implemented, the modern data stack delivers continuous data integration and organization-wide data accessibility, with a minimum of manual intervention and bespoke code.
Oldcastle: A real-world analytics success story
Automated data integration and a modern data stack offer many benefits, from lowering engineering costs and enriching data to reducing time to insight and increasing adaptability to changing market conditions. The following case study illustrates how automated data integration helped one company save hundreds of thousands of dollars through cloud migration.
As an industry leader in building accessory products, Oldcastle Infrastructure fittingly took a “do it yourself” approach when it decided to migrate nearly 40 years worth of data to the cloud. The company started a warehousing project focused on gathering sales data from an on-premise ERP database and NetSuite, so it could have a single view of transactional, manufacturing and production data across both ERPs. Eight months into the project, Oldcastle realized it needed a new approach.
Specifically, the NetSuite API constantly changed, making it difficult for the Oldcastle BI developers to build and maintain the data pipeline. Instead of hiring a full-time engineer to build and maintain the data pipeline, Nick Heigerick, IT Manager of BI at Oldcastle, found that an automated data integration platform like Fivetran was able to effortlessly keep up with the API, extract all of the data, and load it directly to the data warehouse.
As he explains:
With Fivetran, I replicated all of our data in 10 business days — every table and every field — from both our on-prem and cloud ERPs, which saved us about $360,000 in initial setup and maintenance of our SQL Server and NetSuite connectors.
How to modernize your data stack
Implementing a modern data stack is the best way to establish a successful analytics program. You can set up and start testing a modern data stack in less than an hour, because many of the key tools are compatible with one another and offer rapid setup and free trials. Before you do, however, think through your organization’s needs and evaluate the offerings for each technology: data integration tool, cloud data warehouse and business intelligence platform.
Check out The Ultimate Guide to Data Integration for an in-depth look at how to set up your data integration strategy.