Learn why ELT is better than ETL and how you can get started with it.
The following post is by Daniel Haville, founder and CEO of The Analytics Academy. Daniel authored the post as part of our Data Champions program. If you’re interested in contributing to the blog or learning more about the Data Champions program, please get in touch.
Historically, traditional data pipelines used ETL (extract, transform and load) processes to load data into a data warehouse. Transforming before loading forces analysts to predict the data models and insights they will use beforehand. This means that data engineers and analysts must often design and build complicated processes and workflows upfront in order to use data, and then redesign and rebuild them whenever reporting needs change.
The new wave of extracting, loading and transforming data (ELT) means analysts can use data in its original form, and transform it using cloud-based technologies to extract insights without limits. This new approach to data integration expedites the process of data ingestion and radically lowers the barrier to entry for comprehensive analytics.
Fivetran provides a fully automated data pipeline built to help companies automate extracting and loading data into their data warehouses in a cloud-agnostic manner. The Analytics Academy has been closely aligned with Fivetran for quite some time and in this article we will focus on some of the differences between Fivetran and ELT vs ETL-based tools and processes.
Traditional ETL processes force analysts to foresee every use of the data they have, before they even create reports. Because transformations are performed before loading, the ETL process requires engineering time to write bespoke code to extract data from each source and then clean, aggregate, join, and otherwise reshape the data into a model that is appropriate for reporting. These high upfront costs make the whole process extremely inflexible.
With the ELT approach, companies can load raw data in their warehouse and make it immediately accessible to analysts, removing the need to determine what insights are needed beforehand. Using this approach, companies can create a raw layer of data in the data warehouse that will act as a 'source of truth' for the company. This offers a great amount of flexibility to companies and analysts. Staging transformations after loading also grants a consistent “audit trail” of your data journey from source.
ELT is largely enabled by the shift toward cloud-based data warehouse technologies where storage and computation are far cheaper than in the past, so transforms can be undertaken within the data warehouse itself.
Modern businesses generate a huge amount of data. ELT gives you the opportunity to load your data into your warehouse without the upfront cost of predicting the insights that will be useful for your data consumers and stressing over what transformations you need to perform on your data.
Fivetran helps companies by providing a near-effortless solution to create pipelines from all your data sources without code, so analysts, data scientists and engineers can focus on what's important, rather than spending time and effort on building and maintaining pipelines. With Fivetran, companies can build data pipelines for a number of different data sources (known as "connectors") in less than five minutes.
How many times have you been in a meeting where you’ve been asked to answer some questions and back them up with data and the answer was: “We don’t have the data available but I will arrange with our data engineers to load them. I will get back to you as soon as I have the data but they have a long backlog and it can take a few weeks.”
At TAA, we have had the same experience many times in past projects. Imagine now in the same setup the answer could be: “We don’t have the data available in the platform but give me an hour and I will have the report for you!”
Let’s see how Fivetran helps us to achieve that with a simple example.
Fivetran works across most databases such as BigQuery, Redshift and more, but in our example today we will be using Snowflake.
Establish a connection between Fivetran and Snowflake. To do that we can access Fivetran via Partner Connect in our Snowflake instance and click on the Fivetran block. For those unfamiliar with Snowflake and Partner Connect, feel free to reach out to any of The Analytics Academy team and we would be happy to talk you through this.
Using this method, Snowflake automatically creates the database, warehouse, user and role that we need for Fivetran with literally one click.
Fivetran then also creates the destination connection with the Snowflake warehouse for us automatically.
In our example, we are storing our data in an S3 Bucket. Now that we have the connection between Fivetran and Snowflake, we will need to connect that data to our warehouse. To do this we need to create and configure a new connector within the Fivetran UI.
Click on the +connector in Fivetran and select the S3 source.
For our connector to pass the test and connect with the S3 bucket, we need to fill the required fields:
Destination schema: The name of the schema we want the table from the bucket to be under. CAUTION: This cannot be edited after the connector is created
Destination table: The name of the table. CAUTION: This cannot be edited after the connector is created
Bucket: The link to our S3 bucket
Public: For the purposes of this article we are going to use a public bucket. In case that your bucket is not public you can follow the setup instructions here.
Data sources other than the S3 bucket will require slightly different fields, but in all cases you will be asked for a set of credentials.
After we complete all these fields we need to “Save & Test.” If the test is successful, we can then start the pipeline and ingest data for the first time in the destination table in Snowflake.
The initial sync might take a few minutes to a few days, depending on the size of your data set. Once the sync is complete, the data is ready for usage in your Snowflake environment and can be served up to your reporting layer. Your users are now empowered to dive in, generate insights and bring value to your customers and business. Enjoy!
We’ve seen how we can go from an S3 bucket to a fully realised data source within a centralised cloud data warehouse in a matter of minutes. There may be situations in which transforms are needed before loading, but for all other cases we can now remove these steps with modern data architecture and a new ingestion method.
Please get in touch with anyone on The Analytics Academy team for more information on ELT/ETL practices, Fivetran or cloud data warehouses like Snowflake.