Working with data usually involves these steps: collect, store, and use it for reports. But one part can slow teams down, and that is turning raw data into a format that's clean and ready to use. For example:
- Sales data might record the same customer’s name in three different ways.
- A product catalog may list prices in different currencies without conversion.
These differences are noticeable because data comes from various systems like CRM or ERP, and using this data as-is can make downstream data analysis challenging.
That’s where the data build tool (dbt) comes in to simplify the transformation step.
Let's examine dbt: what it is, why it matters for data modeling, and where it fits into the modern data stack.
What is dbt?
Data build tool, more commonly down as dbt, is an SQL-based tool transforms raw data into analytics-ready models inside your data warehouse.
It’s open-source and developed by dbt Labs, formerly Fishtown Analytics.
Some things dbt makes possible include:
- Reusing the same transformation logic across projects instead of starting over each time.
- Documenting data models so everyone understands what a table represents.
- Testing data automatically to catch errors (like mismatched IDs) before they appear in dashboards.
Why dbt matters for modeling
Companies collect information from many different systems, such as sales, marketing, finance, and product, and each stores it in its own format.
Modeling pulls scattered data sources together into a single view, making it easier to create reports and dashboards. Here are the main steps.
Modeling data with SQL
The first step in data modeling is deciding what kind of table or view you prefer for analysis. For example, you might want:
- A single customer table that combines details from your sales system and your support system.
- An orders table where dates follow the same format and currencies are standardized.
- A marketing performance table that merges campaign data from different platforms.
dbt lets you build these models directly inside data warehouses using SQL SELECT statements. A SELECT statement is a command that tells the database which pieces of data you want to pull and how you want to arrange them.
For example:
- You might select only the necessary columns (like customer name and total spend).
- You can filter the rows (like orders placed in the last 30 days).
- You can join tables together (like linking customer data with order history).
In dbt, these SELECT statements become the building blocks for your data models. Instead of being temporary queries, they’re stored, versioned, and reusable so your whole team can rely on them.
Structure in dbt Projects
In dbt, “structure” means how project files and folders are organized. There's a ready-made layout, so at the top level, there are folders for models, tests, snapshots, seeds (small static datasets), macros (reusable code), and a main configuration file called dbt_project.yml.
Inside the models folder, teams often split work into layers such as:
- Staging tables for cleaning and preparing raw data
- Intermediate for combining and shaping data
- Data marts for final business-ready tables and dashboards
This setup keeps the project clear and modular, so different people can work together without getting lost.
Because dbt projects usually connect with Git, teams can review updates, roll back mistakes, and automate deployments. Git is the most common way to manage dbt projects because it handles version control and teamwork well. But it isn’t the only choice.
Some teams use GitLab, Bitbucket, or Gogs, which all offer similar features. Also, if you’re working alone or just starting, you could store a dbt project in a shared drive, a local folder, or another code management tool.
Adding context with YAML
SQL defines what your data should look like. But you also need to add rules, expectations, and descriptions. This is where Yet Another Markup Language (YAML) files come in.
In dbt, YAML is used to:
- Define tests (for example, checking if values are unique or not null)
- Document columns and models so others know what each field represents
- Configure how models should be materialized (table, view, incremental, etc.)
Reusing work with dbt packages
Once your team has mastered SQL models and YAML testing, the next step is reusability.
dbt packages are pre-built transformations you can plug into your own project.
You can add a package to your project and immediately use its models, macros, and tests just like using a code library in programming.
Where dbt fits in the modern data stack
A modern data stack is a collection of cloud-based tools that handles the full data journey, amd dbt has a very specific role to play in this setup. Unlike ETL tools, dbt doesn’t extract or load data; instead, it focuses only on the transformation step.
While dbt’s core transformation functionality is consistent, its impact can vary in practice depending on team roles and responsibilities.
For the other steps, you can pair dbt with ELT platforms like Fivetran to automate data movement into your warehouse.
Decision guide: Is dbt right for my team?
You can use this workflow to see if dbt is the correct fit.
- Question 1: Is your organization using an ELT architecture (i.e., loading raw data into a cloud data warehouse — like Snowflake, BigQuery, or Redshift — before performing data transformations)?
- Yes → Go to Question 2 (dbt is purpose-built for in-warehouse data transformations.)
- No → Stop here. (dbt is not designed for pre-load data manipulation or orchestration.) For upstream data transformations, consider ETL platforms such as:
- Informatica
- Talend
- Matillion
- Question 2: Does your team have SQL proficiency?
- Yes → Go to Question 3 (dbt is SQL-centric and enables analysts and engineers to collaborate on transformation logic.)
- No → Stop here. (dbt adoption may require upskilling.) Instead, consider:
- Structured SQL training programs
- Hybrid development models with engineering support
- Temporarily using low-code tools while building SQL capacity
- Question 3: Are testing, documentation, and version control part of your data governance strategy?
- Yes → Go to Question 4.
- No → Stop here. Consider whether simpler scripting or visual tools can meet current governance needs.
- Question 4: Does your team require modular, reusable transformation logic?
- Yes → dbt is likely a good fit ✅ Its model-based structure and support for macros and packages are ideal for maintaining modular reusable workflows
- No → Stop here. Consider whether simpler SQL scripts or BI tool transformations are sufficient for the scope of work.
Scale your transformations with Fivetran’s integrated dbt Core
dbt makes modern data transformation collaborative and straightforward. Paired with Fivetran, it becomes part of a complete ELT pipeline, from extraction to analytics-ready models. With automated data pipelines, prebuilt dbt packages, and integrated scheduling, your team can spend less time managing code and more time delivering insights
And with Fivetran’s dbt Core integration, teams can choose from over 700 pre-built connectors to extract data, load it into a data warehouse or other destination, and automate Transformations.
[CTA_MODULE]
[CTA_MODULE]