Throughout my career as a solution architect (SA), I have seen the buzz around starting a new data project many times, with companies eagerly acquiring the latest technologies and hiring large teams, only to face the same hurdles over and over again. For a long time, my job involved designing and building pipelines to move data from sources into reports and models. I am something of a data integration groupie and have always looked ahead to see how the industry will change. What I have experienced firsthand is nothing short of a revolution in the data integration industry.
The prehistoric age of on-premise data integration
The early days of my career involved several large-scale data migration and analytical reporting projects, typically from sources to on-premise relational databases such as Teradata, Oracle and SQL Server. These projects were long, time-consuming and demanded constant supervision. I distinctly remember the team having to sign in during the middle of the night because the nightly batch had failed and the reports had to be in place the following morning! Manual coding and configuration were constant features (and irritants) of these long running projects.
What we had was in every sense an engineering/IT driven architecture, with entire sprints dedicated to schema changes and constant haggling with the security team to give us enough access and independence to do our job. It was impossible to pivot quickly. To meet our SLAs, we had to tune the performance at a very low level, request more compute and storage hardware as needed, and more. This traditional ETL was soon to get a boost with Hadoop becoming more prevalent in the industry...
A link in the evolutionary chain: Big data and IaaS
As “Big Data” became the buzzword, I worked with companies who were early adopters in the low commodity hardware/parallel processing boom. The major vendors of the time included Hortonworks and Cloudera, who for a time successfully promoted the infrastructure-as-a-service (IaaS) model. This made architecture more complex with a mix of different technologies and services, but data delivery and transformation was faster.
The interdependence of the technologies made it difficult to maintain, with any one service being down causing the pipelines to fail. Our daily data syncs were accompanied by heavy peak-time traffic, and we would sometimes have to triage what data was synced, inevitably leading some teams to miss fresh data. It wasn’t as elastic as we would have liked. The market also hadn’t really matured or settled on a standard, so MapReduce soon gave way to Spark, resulting in wide scale changes to our design and code. This was still an engineering workflow, with sprints and daily standups, just with a little more grunt behind our code.
The SaaS Rrevolution
The world as an SA today, characterized by an explosion in SaaS apps and data sources as well as off-the-shelf, automated data pipelines, is markedly different. I have been at Fivetran for nearly a year now and I have already seen companies get more value from data faster than I have across almost all my other projects combined. Companies today can go live with new data integrations in less time than it would have taken to change folder permissions back in the on-premise or IaaS age.
Back in the “Big Data” period, even cloud-based data warehouses with processing elasticity and MPP architecture needed custom coding, configuration, scheduling and other management. Modern fully-managed data integration technologies like Fivetran, along with the new ELT architecture and Cloud Data Warehouses, does away with all of these complications.
Instead of engineering sprints to produce new data models and accommodate schema changes, transformations are now the purview of analysts, written in SQL, which we know is here to stay. Tools like dbt, which bring version control, collaboration and other software development best practices to the work of analysts exemplify the best of this new approach.
Automated data pipelines mean no more sprints, standups, late-night calls, miscoordination, and cranky messages between teams. Instead, data teams can concentrate on how that data will look for the business and continue adding more sources to gain more insights. Keeping up with the latest SaaS tools by hand is impossible – you can probably think of at least a dozen important tools your organisation uses which you have not extracted data from yet. Fivetran’s simplicity allows you to bring that data in with minimal configuration and effort, and focus on the critical asset of the business, data.
This simplicity is the culmination of progressive improvements in the data integration world. Taken together, they amount to a complete revolution in the reality of data integration. Life for data professionals is now easier, and more interesting, than ever before. It has also made my job a little easier, just don’t tell my boss.