Many analytics programs struggle to assimilate data from numerous and unpredictable sources, but automated ELT offers a solution.
Why do so many businesses struggle to establish successful analytics programs? A lack of data is not the problem. Data volumes — from hundreds of cloud applications to millions of IoT endpoints — are exploding across organizations and industries. The real challenge is getting access to timely, trusted data on a reliable basis so that data analysts can actually do their job — analyze data!
Data can originate from hundreds or thousands of sources across an organization, including:
Digital activity recorded by software triggers, such as clicks on a website or app
Transactions stored in a finance system
Report data from disparate advertising platforms
The problem of gaining simple, reliable access to data is twofold. As SaaS applications continue to proliferate and prove their value in organizations of all sizes, companies are implementing more applications to run their business. Most organizations use a wide range of apps to handle operations such as customer relationship management, billing and customer service.
Furthermore, each one of these distinct applications features a web of complex APIs and data models that can change at a moment’s notice — or no notice at all. All of these pose a daunting data integration challenge. Combine an exploding number of applications with the increasing complexity of apps, and it’s no wonder organizations are failing to do analytics well or realize any business value at all.
The predominant method of data integration is a process known as extract-transform-load, or ETL, which has been around for decades. The acronym ETL is often used colloquially to describe data integration activities in general. ETL evolved at a time when computing power, storage and bandwidth were scarce and expensive. Built in an era with such resource constraints, ETL is a limited, antiquated data integration tool in the cloud era, which is characterized by unlimited horsepower and huge cost-efficiencies.
In building an ETL pipeline, data engineers and analysts follow a workflow that typically includes the following steps:
Determine project scope – identify the bounds and business goals of the report
Define schemas – model the data and determine the necessary transformations
Build ETL – write the software, specifying the details of the API endpoints to call, how to normalize the data, and how to load it into the destination
Surface insights – generate reports that are digestible for key decision-makers
Address broken pipelines and report interruptions
Re-scope the project
This ends up being a resource-intensive, endless cycle, as each data pipeline runs on custom code designed for specific use cases. The code can become nonfunctional with little warning, leading to broken pipelines. And when the C-suite or line of business demands a new requirement, data engineers are faced with extensive code revisions.
Any organization awash in data yet dependent on ETL will always struggle to access the right information at the right time. And yet, as we mentioned above, ETL remains the industry standard among established organizations. Many businesses are using a 1970s technology while modern, cloud-native businesses pull away.
A June 2020 Dimensional Research survey of nearly 500 data professionals revealed multiple error-prone practices, inefficiencies and data latency issues related to outmoded integration technology:
62% of data analysts report having to wait on engineering resources numerous times each month
90% say numerous data sources were unreliable over the last 12 months
86% of analysts say they have to resort to using data that is out of date, with 41% using data two months old or older
Many modern businesses, including Square, Urban Outfitters and DocuSign, have adopted a different approach to data integration. This modern approach, known as "automated ELT" or "automated data integration," makes data access as simple and reliable as electricity. Data analysts using automated ELT and a modern data stack can make timely, well-informed recommendations with little to no engineering burden.
ELT, or “extract, load, transform,” shifts the “transform” step to the end of the data pipeline: Analysts can load data before transforming it, so they don’t have to determine beforehand exactly what insights they want to generate; the underlying source data is faithfully replicated to a data warehouse and becomes a “single source of truth.” Analysts can then perform transformations on the data without compromising the integrity of the warehoused data.
ELT takes full advantage of the cost-efficiencies of modern cloud data warehouses, which are column-oriented and feature architectures that separate compute from storage. Designed to run analytics queries extremely efficiently, they allow organizations to store massive amounts of data and run queries over those data sets cost-effectively.
Automated ELT leverages prebuilt, zero-configuration data connectors that automatically detect and replicate schema and API changes, and lightly clean and normalize data. These activities require a deep knowledge of data sources, extensive data modeling and analytics expertise, and the engineering know-how to build robust software systems.
What we refer to as the “modern data stack” is predicated on ELT and replaces on-premise technologies with cloud-native SaaS technologies. The key tools in a modern data stack are:
A fully managed data pipeline for extract and load
A cloud data warehouse
A business intelligence tool
Properly implemented, the modern data stack delivers continuous data integration and organization-wide data accessibility, with a minimum of manual intervention and bespoke code.
Automated data integration and a modern data stack offer many benefits, from lowering engineering costs and enriching data to reducing time to insight and increasing adaptability to changing market conditions. The following case study illustrates how a modern data stack helped one company save hundreds of thousands of dollars through cloud migration.
As an industry leader in building accessory products, Oldcastle Infrastructure fittingly took a “do it yourself” approach when it decided to migrate nearly 40 years’ worth of data to the cloud. The company started a warehousing project focused on gathering sales data from an on-premise ERP database and NetSuite so it could have a single view of transactional, manufacturing and production data across both ERPs. Eight months into the project, Oldcastle realized it needed a new approach. The challenging piece to the pipeline puzzle was the NetSuite connector.
Nick Heigerick, IT Manager of BI at Oldcastle, explains:
NetSuite is constantly changing its API, making conventional pipeline strategies difficult to maintain. I thought I would have to pay someone a lot of money to monitor what happens in the background, but Fivetran keeps up with the API, grabs all of the data, and automatically centralizes it into Snowflake. The first project never even attempted to grab the data. They just didn’t think it was possible. With Fivetran, I replicated all of our data in 10 business daya — every table and every field — from both our on-prem and cloud ERPs, which saved us about $360,000 in initial setup and maintenance of our SQL Server and NetSuite connectors.
You can set up and start testing a modern data stack in less than an hour, because many of the key tools are compatible with one another and offer rapid setup and free trials. Before you do, however, think through your organization’s needs and evaluate the offerings for each technology: data integration tool, cloud data warehouse and business intelligence platform. In our blog post “Cloud-Based Data Analytics in Three Steps,” we walk you through this process, showing you how to select and test the tools that comprise a modern data stack.