The unveiling of Fivetran Transformations presents us with an opportunity to review the ways in which Fivetran offers a new approach to data pipelines in general and transformation in particular. The Fivetran approach to data engineering is predicated on Extract-Load-Transform (ELT) rather than Extract-Transform-Load (ETL). This has major implications for the design of the entire data pipeline.
The old orchestration: Complex and handwritten
Traditional data orchestration is inextricably tied to ETL-based data pipelines. Although the exact details of implementation may vary, an ETL data pipeline featuring data orchestration might involve (in decreasing order of grunt work):
- Cron jobs and handwritten scripts
- Pipeline or scheduling tools such as Luigi or Airflow
- Traditional ETL tools such as Matillion
In each of these implementations, your team will perform the following steps:
- Analysts determine required data models
- Data engineers write extraction scripts for API endpoints
- Data engineers write scripts to transform data in accordance with data models
- Data engineers write scripts to load the data into a data warehouse
- Analysts build dashboards and reports
Note how heavy the process is on scripting. To organize this process, you will have to arrange discrete the units of work represented by your scripts into sequences. Some tools only feature scheduling on the basis of time intervals. More sophisticated workflow management tools feature directed acyclic graphs, in which tasks can be scheduled according to logical dependencies. Typically, these tasks perform the “T” part of ETL. The image below, from left to right, illustrates the extraction and transformation of data from several sources into a data model for reporting.
Source: Apache Airflow Documentation
You will need to configure cloud infrastructure, as well, to host and run your code and handle the data at various intermediate stages. At every level of sophistication, the common denominators to all traditional ETL approaches are extensive configuration, scripting and coding, and a huge number of moving parts. They are fundamentally tools for experts.
The new orchestration: Simple and automated
We have covered the difference between ETL and ELT on several occasions, though without discussing the technical details or tools involved in ETL. It bears repeating that, by incorporating transformation directly into the middle of the process, ETL features a level of complexity and brittleness that is increasingly unnecessary.
Fivetran has always allowed you to extract and load data straight from the source with high fidelity on a near-continuous basis. With Fivetran Transformations, you can now write, schedule, and version-control transformations for your data within the data warehouse environment as well. Since Fivetran Transformations are written in SQL, they are legible to a much wider audience than Python-based workflows and in some cases more performant, too.
Unlike the multitude of steps, tools and technologies characteristic of ETL, the flow for Fivetran is simple, effortless and can be entirely performed by analysts:
- Analysts set up a data warehouse within Fivetran
- Analysts set up Fivetran connectors and begin syncing
- Analysts write and schedule transformation scripts
- Analysts build dashboards and reports
The first two steps only require the user to supply credentials; there is no scripting until your analysts need transformations.
Cut through the knot
There will always be niche use-cases that require the use of more complicated and less accessible tools. Fivetran Transformations is predicated on the idea that for the vast majority of use cases, a complex web of orchestration with many stages and parts is wholly unnecessary.
To learn more about Fivetran Transformations, sign up for a personalized demo or get started today by signing up for a free trial.