Learn how dbt adds data modeling and transformation to the modern data stack.
The data analytics stack has several layers: extract, load, transform, analyze. Businesses have multiple software alternatives they can use to let citizen analysts extract and load data and run analytics, but the transformation phase still requires specialized expertise. To transform data, you need a data engineer or database administrator who knows how relational databases work — but that transformation work doesn’t have to be a manual process that a DBA creates anew with every project.
dbt (“data build tool,” all lowercase letters) is a data transformation and modeling tool from a company called dbt Labs, formerly Fishtown Analytics. The dbt data transformation tool, which is SQL-based, lets data professionals build data models iteratively and automate data transformation. Users model their data using SQL SELECT statements, create relationships and dependencies between models, and then materialize those models as tables and views in a data warehouse. From there it’s a simple task to turn the models into business intelligence.
As it applies to data analytics, data modeling is the process of imposing order on data from diverse sources. That means transforming tables from different schemas or even databases into a single view or table that describes a business system or process. Once you’ve modeled the data, it’s much easier to derive value from it, either in the form of reports or dashboards or as a basis for predictive or prescriptive analytics.
A dbt project is just a directory of SQL and YAML files that dbt uses to transform your data. The YAML file contains project configuration information. Each model contains a single SQL SELECT statement that transforms data in some way.
SQL has long been the language of choice for analysts developing transformations and performing analysis. Therefore, the decision for dbt to have a model file be a SQL SELECT statement means analysts don’t need to learn a new language or tool and can freely transform data the same way they have previously. Using SQL makes for easy collaboration and a better understanding of your transformations.
dbt Packages are standalone dbt projects, which take advantage of the projects’ modularity to make modules that tackle a specific problem area available for future use, much like a code library. These packages contain pre-built SQL, drastically reducing the necessity to hand-code transformations. Fivetran develops and maintains open-source dbt Packages which model data from single schema and multi-schema data sources. These Fivetran dbt Packages leverage data modeling principles and code modularity to effectively transform data to be used for exploratory, predictive, or prescriptive analytics. For more information on the Fivetran dbt packages and how to use them, you can refer to the dbt hub.
The YAML files are not only used to configure your dbt project, but are also used to curate a data dictionary of your source and transformed data as well as apply data integrity tests. If used effectively, the final result of your dbt project will have a definition of each raw table from the source and transformed table from your dimensional schema, as well as a description of each field within those tables. Further, you can apply tests to certain fields (ie. a primary key field should always be unique) and test your data to ensure there are no data integrity issues.
Models are modular, so you can reproduce a transformation across multiple projects. Any developer can tell you how valuable code reuse is in terms of saving time and effort. A project may also include test, macro, and documentation files.
The dbt platform comprises two products. When most people refer to dbt, they’re talking about dbt Core. It takes a dbt project and a command and creates tables or views in your warehouse. dbt Core is free and released under an Apache License as open source software.
The other product, dbt Cloud, provides a web-based IDE to help teams develop dbt projects and a scheduler. Some dbt Cloud features are free, while other features, for collaboration and enterprise use, have a cost to use them.
Is dbt an ETL (extract, transform, load) tool? No, dbt is a data transformation tool. But dbt works with Fivetran, so you can use dbt as part of your end-to-end ELT processes.
Last year, we began integrating dbt transformations with Fivetran. The feature, called Fivetran Transformations, integrates dbt Core directly into Fivetran, so customers can orchestrate the cleaning, testing, transformation, modeling, and documentation of data sets with the same tool they use to extract and load the data. We created a setup guide to make it easy to start taking advantage of Fivetran Transformations.
Data analytics is a hot business segment — witness the buzz around companies such as Snowflake and Databricks. dbt Labs has been growing like crazy since it was founded in 2016 by colleagues who first worked together at a company called RJMetrics. In June of 2021, the company raised nearly $150 million in a Series C.
The company was a first mover among data transformation products, but it’s not the only one out there. A younger company called Dataform was a direct competitor until it was snapped up by Google Cloud late last year. Similarly, analytics software Looker offers data modeling through its LookML language; Looker was acquired by Google Cloud too in 2019. Other products do some of what dbt does, but few if any offer the range of capabilities of dbt in a package that works in just the way data engineers like to work.
You can try dbt on your own, but it’s more powerful as part of the complete data pipeline that Fivetran provides. In addition to the core Fivetran data integration capabilities, Fivetran also provides Fivetran Transformations powered by dbt which allows you to connect a dbt project to your Fivetran account and orchestrate your end-to-end ELT pipelines all from within Fivetran. Stay tuned for more information on updates and enhancements to Fivetran Transformations as more features will be added in the future.
Consider the case of Ritual, where the data team used dbt in conjunction with Fivetran to replace scattered Python scripts and LookML files. Ritual used the automated testing and data freshness capabilities of dbt to drive a 68% increase in new feature development.
Sign up for a free Fivetran trial, and see how easy it is to connect to more than 150 data sources, extract data, load it into a data warehouse or data lake, transform it with Fivetran Transformations, and use it to enhance your business.