A Beginner’s Guide to ELT Data Pipelines
Historically, traditional data pipelines used ETL (extract, transform and load) processes to load data into a data warehouse. Transforming before loading forces analysts to predict the data models and insights they will use beforehand. This means that data engineers and analysts must often design and build complicated processes and workflows upfront in order to use data, and then redesign and rebuild them whenever reporting needs change.
The new wave of extracting, loading and transforming data (ELT) means analysts can use data in its original form, and transform it using cloud-based technologies to extract insights without limits. This new approach to data integration expedites the process of data ingestion and radically lowers the barrier to entry for comprehensive analytics.
Fivetran provides a fully automated data pipeline built to help companies automate extracting and loading data into their data warehouses in a cloud-agnostic manner. The Analytics Academy has been closely aligned with Fivetran for quite some time and in this article we will focus on some of the differences between Fivetran and ELT vs ETL-based tools and processes.
Engineer-Centered ETL vs. Analyst-Centered ELTTraditional ETL processes force analysts to foresee every use of the data they have, before they even create reports. Because transformations are performed before loading, the ETL process requires engineering time to write bespoke code to extract data from each source and then clean, aggregate, join, and otherwise reshape the data into a model that is appropriate for reporting. These high upfront costs make the whole process extremely inflexible.
With the ELT approach, companies can load raw data in their warehouse and make it immediately accessible to analysts, removing the need to determine what insights are needed beforehand. Using this approach, companies can create a raw layer of data in the data warehouse that will act as a 'source of truth' for the company. This offers a great amount of flexibility to companies and analysts. Staging transformations after loading also grants a consistent “audit trail” of your data journey from source.
ELT is largely enabled by the shift toward cloud-based data warehouse technologies where storage and computation are far cheaper than in the past, so transforms can be undertaken within the data warehouse itself.
Modern businesses generate a huge amount of data. ELT gives you the opportunity to load your data into your warehouse without the upfront cost of predicting the insights that will be useful for your data consumers and stressing over what transformations you need to perform on your data.
Fivetran and ELTFivetran helps companies by providing a near-effortless solution to create pipelines from all your data sources without code, so analysts, data scientists and engineers can focus on what's important, rather than spending time and effort on building and maintaining pipelines. With Fivetran, companies can build data pipelines for a number of different data sources (known as "connectors") in less than five minutes.
How many times have you been in a meeting where you’ve been asked to answer some questions and back them up with data and the answer was: “We don’t have the data available but I will arrange with our data engineers to load them. I will get back to you as soon as I have the data but they have a long backlog and it can take a few weeks.”
At TAA, we have had the same experience many times in past projects. Imagine now in the same setup the answer could be: “We don’t have the data available in the platform but give me an hour and I will have the report for you!”
Let’s see how Fivetran helps us to achieve that with a simple example.
Moving Data From an S3 Bucket to Snowflake in Three StepsFivetran works across most databases such as BigQuery, Redshift and more, but in our example today we will be using Snowflake.
Establish a connection between Fivetran and Snowflake. To do that we can access Fivetran via Partner Connect in our Snowflake instance and click on the Fivetran block. For those unfamiliar with Snowflake and Partner Connect, feel free to reach out to any of The Analytics Academy team and we would be happy to talk you through this.
Step 1 – Connect Fivetran and Snowflake
Using this method, Snowflake automatically creates the database, warehouse, user and role that we need for Fivetran with literally one click.
Fivetran then also creates the destination connection with the Snowflake warehouse for us automatically.
Step 2 – Connect a data sourceIn our example, we are storing our data in an S3 Bucket. Now that we have the connection between Fivetran and Snowflake, we will need to connect that data to our warehouse. To do this we need to create and configure a new connector within the Fivetran UI.
Click on the +connector in Fivetran and select the S3 source.
For our connector to pass the test and connect with the S3 bucket, we need to fill the required fields:
- Destination schema: The name of the schema we want the table from the bucket to be under. CAUTION: This cannot be edited after the connector is created
- Destination table: The name of the table. CAUTION: This cannot be edited after the connector is created
- Bucket: The link to our S3 bucket
- Public: For the purposes of this article we are going to use a public bucket. In case that your bucket is not public you can follow the setup instructions here.
After we complete all these fields we need to “Save & Test.” If the test is successful, we can then start the pipeline and ingest data for the first time in the destination table in Snowflake.
The initial sync might take a few minutes to a few days, depending on the size of your data set. Once the sync is complete, the data is ready for usage in your Snowflake environment and can be served up to your reporting layer. Your users are now empowered to dive in, generate insights and bring value to your customers and business. Enjoy!
Step 3 – Wait a little
The Value of the Modern Data StackWe’ve seen how we can go from an S3 bucket to a fully realised data source within a centralised cloud data warehouse in a matter of minutes. There may be situations in which transforms are needed before loading, but for all other cases we can now remove these steps with modern data architecture and a new ingestion method.
Please get in touch with anyone on The Analytics Academy team for more information on ELT/ETL practices, Fivetran or cloud data warehouses like Snowflake.