AWS Cloud Cost dbt Package (Docs)
What does this dbt package do?
This package models AWS Cloud Cost data from Fivetran's AWS Cost Report connector. It uses data in the format described by the AWS Cost & Usage Report.
The main focus of the package is to transform the core object tables into analytics-ready models, including:
- Materializes AWS Cloud Cost staging tables which leverage data in the format described by the AWS Cost & Usage Report. These staging tables clean, test, and prepare your AWS Cloud Cost data from Fivetran's connector for analysis by doing the following:
- Names columns for consistency across all packages and for easier analysis:
- Column names are shortened for convenience and to avoid redundancy.
- Takes the latest export of each account's billing month report.
- Names columns for consistency across all packages and for easier analysis:
- Creates analytics-ready end models for monitoring and investigating cost & usage of different AWS services across your organizations.
- Generates a comprehensive data dictionary of both your AWS Cloud Cost source and modeled data through the dbt docs site.
This package does not apply freshness tests.
The following table provides a detailed list of all tables materialized within this package by default.
TIP: See more details about these tables in the package's dbt docs site.
Table | Description |
---|---|
aws_cloud_cost__daily_overview | Daily aggregation of the Standard Cost & Usage Report (2.0) exported from AWS. Includes slew of commonly analyzed dimensions related to billing, pricing, line item buckets, and products. Contains both high-level cost and usage metrics, along with all metrics related to reservations and savings plans. Also includes financial reporting fieds relating to invoices and billing periods. |
aws_cloud_cost__daily_product_report | Daily view of each account's use of and associated costs from individual AWS products for each billing period. Built off of the daily overview model. |
aws_cloud_cost__daily_instance_report | Daily view of each account's use of and associated costs from different Amazon Elastic Compute Cloud (EC2) instances for each billing period. Built off of the daily overview model. |
How do I use the dbt package?
DISCLAIMER: This package transforms source data of potentially very high volumes. Please be aware of the size of your dataset(s) and take this into consideration when configuring the frequency with which you will orchestrate the package models. See Step 4 for tools to mitigate compute and storage costs.
Step 1: Prerequisites
To use this dbt package, you must have the following:
- At least one Fivetran AWS Cloud Cost connector syncing data into your destination.
- A BigQuery, Snowflake, Redshift, Databricks, or PostgreSQL destination.
Databricks dispatch configuration
If you are using a Databricks destination with this package, you must add the following (or a variation of the following) dispatch configuration within your dbt_project.yml
. This is required in order for the package to accurately search for macros within the dbt-labs/spark_utils
then the dbt-labs/dbt_utils
packages respectively.
dispatch:
- macro_namespace: dbt_utils
search_order: ['spark_utils', 'dbt_utils']
Step 2: Install the package
Include the following AWS Cloud Cost package version in your packages.yml
file:
TIP: Check dbt Hub for the latest installation instructions or read the dbt docs for more information on installing packages.
packages:
- package: fivetran/aws_cloud_cost
version: [">=0.1.0", "<0.2.0"] # we recommend using ranges to capture non-breaking changes automatically
Step 3: Define database, schema, and table name variables
Option A: Single connector
By default, this package assumes your AWS Cost & Usage Report data lives in the following location:
- Your
target.database
- A schema called
aws_cloud_cost
- A table called
aws_cloud_cost_report
In the very likely case that your AWS Cloud Cost source data lives someplace else (for example, if your AWS Cloud Cost schema is named aws_cloud_cost_fivetran
or your table is called aws_billing
), add the following configuration to your root dbt_project.yml
file:
# dbt_project.yml
vars:
aws_cloud_cost_database: your_database_name # default: target.database
aws_cloud_cost_schema: your_schema_name # default: aws_cloud_cost
aws_cloud_cost_report_identifier: your_table_name # default: aws_cloud_cost_report
Option B: Union multiple connectors
If you have multiple AWS Cloud Cost connectors in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. The package will union all of the data together and pass the unioned table into the transformations. You will be able to see which source it came from (the database.schema.table
, NOT the source name
) in the source_relation
column of each model.
To use this functionality, you will need to configure the aws_cloud_cost_sources
dictionary-list in your root dbt_project.yml
file. For each source, provide the appropriate database
, schema
, and table
for each dataset:
# dbt_project.yml
vars:
aws_cloud_cost_sources:
- database: source_databse_name # default: target.database
schema: source_schema_name
table: source_table_name
- database: 'my-db-example'
schema: aws_cost_schema_example
table: report_table_example
# include as many sources as you'd like
Recommended: Incorporate unioned sources into DAG
NOTE: The native
aws_cloud_cost
source connection set up in the package will not function when the union-feature is utilized. Although the package will run correctly and the data will be correctly transformed, you will not observe the sources linked to the package models in the Directed Acyclic Graph (DAG).
To properly incorporate all of your AWS Cloud Cost connectors into your project's DAG:
- For each source provided to the
aws_cloud_cost_sources
variable, you must now add a uniquename
attribute. This can be any name, so long as it is unique and matches thesource.name
you define in the following step.
# dbt_project.yml
vars:
aws_cloud_cost_sources:
- database: source_databse_name
schema: source_schema_name
table: source_table_name
name: unique_source_name # NOW REQUIRED - can choose any name so long as it is unique
- database: 'my-db-example'
schema: aws_cost_schema_example
table: report_table_example
name: my_aws_cost_report_source
# include as many sources as you'd like
- Define each source provided to the
aws_cloud_cost_sources
variable in a.yml
file in your root project'smodels/
pathway. Utilize the following template for thesource
-level configurations, and, most importantly, copy and paste the table and column-level definitions:
Expand for source template
# a .yml file in your root project
sources:
- name: <name> # Must map onto name in var(aws_cloud_cost_sources) you added in the previous step
schema: <schema_name> # Must map onto schema in var(aws_cloud_cost_sources)
database: <database_name> # Must map onto database in var(aws_cloud_cost_sources)
loader: fivetran
loaded_at_field: _fivetran_synced
tables:
- name: <table_name_as_it_appears_in_warehouse> # Must map onto table in var(aws_cloud_cost_sources)
description: '{{ doc("aws_cloud_cost_report") }}' # Your projecy will inherit docs blocks defined by this package
columns: &aws_report_columns # Can use columns: *aws_report yaml anchor in subsequent sources
- name: _file
description: '{{ doc("_file") }}'
- name: _line
description: '{{ doc("_line") }}'
- name: _fivetran_synced
description: '{{ doc("_fivetran_synced") }}'
- name: _modified
description: '{{ doc("_modified") }}'
- name: bill_bill_type
description: '{{ doc("bill_bill_type") }}'
- name: bill_billing_entity
description: '{{ doc("bill_billing_entity") }}'
- name: bill_billing_period_end_date
description: '{{ doc("bill_billing_period_end_date") }}'
- name: bill_billing_period_start_date
description: '{{ doc("bill_billing_period_start_date") }}'
- name: bill_invoice_id
description: '{{ doc("bill_invoice_id") }}'
- name: bill_invoicing_entity
description: '{{ doc("bill_invoicing_entity") }}'
- name: bill_payer_account_id
description: '{{ doc("bill_payer_account_id") }}'
- name: identity_line_item_id
description: '{{ doc("identity_line_item_id") }}'
- name: identity_time_interval
description: '{{ doc("identity_time_interval") }}'
- name: line_item_availability_zone
description: '{{ doc("line_item_availability_zone") }}'
- name: line_item_blended_cost
description: '{{ doc("line_item_blended_cost") }}'
- name: line_item_blended_rate
description: '{{ doc("line_item_blended_rate") }}'
- name: line_item_currency_code
description: '{{ doc("line_item_currency_code") }}'
- name: line_item_legal_entity
description: '{{ doc("line_item_legal_entity") }}'
- name: line_item_line_item_description
description: '{{ doc("line_item_line_item_description") }}'
- name: line_item_line_item_type
description: '{{ doc("line_item_line_item_type") }}'
- name: line_item_normalization_factor
description: '{{ doc("line_item_normalization_factor") }}'
- name: line_item_normalized_usage_amount
description: '{{ doc("line_item_normalized_usage_amount") }}'
- name: line_item_operation
description: '{{ doc("line_item_operation") }}'
- name: line_item_product_code
description: '{{ doc("line_item_product_code") }}'
- name: line_item_resource_id
description: '{{ doc("line_item_resource_id") }}'
- name: line_item_tax_type
description: '{{ doc("line_item_tax_type") }}'
- name: line_item_unblended_cost
description: '{{ doc("line_item_unblended_cost") }}'
- name: line_item_unblended_rate
description: '{{ doc("line_item_unblended_rate") }}'
- name: line_item_usage_account_id
description: '{{ doc("line_item_usage_account_id") }}'
- name: line_item_usage_amount
description: '{{ doc("line_item_usage_amount") }}'
- name: line_item_usage_end_date
description: '{{ doc("line_item_usage_end_date") }}'
- name: line_item_usage_start_date
description: '{{ doc("line_item_usage_start_date") }}'
- name: line_item_usage_type
description: '{{ doc("line_item_usage_type") }}'
- name: pricing_currency
description: '{{ doc("pricing_currency") }}'
- name: pricing_lease_contract_length
description: '{{ doc("pricing_lease_contract_length") }}'
- name: pricing_offering_class
description: '{{ doc("pricing_offering_class") }}'
- name: pricing_public_on_demand_cost
description: '{{ doc("pricing_public_on_demand_cost") }}'
- name: pricing_public_on_demand_rate
description: '{{ doc("pricing_public_on_demand_rate") }}'
- name: pricing_purchase_option
description: '{{ doc("pricing_purchase_option") }}'
- name: pricing_rate_code
description: '{{ doc("pricing_rate_code") }}'
- name: pricing_rate_id
description: '{{ doc("pricing_rate_id") }}'
- name: pricing_term
description: '{{ doc("pricing_term") }}'
- name: pricing_unit
description: '{{ doc("pricing_unit") }}'
- name: product_fee_code
description: '{{ doc("product_fee_code") }}'
- name: product_fee_description
description: '{{ doc("product_fee_description") }}'
- name: product_from_location
description: '{{ doc("product_from_location") }}'
- name: product_from_location_type
description: '{{ doc("product_from_location_type") }}'
- name: product_from_region_code
description: '{{ doc("product_from_region_code") }}'
- name: product_instance_family
description: '{{ doc("product_instance_family") }}'
- name: product_instance_type
description: '{{ doc("product_instance_type") }}'
- name: product_location
description: '{{ doc("product_location") }}'
- name: product_location_type
description: '{{ doc("product_location_type") }}'
- name: product_operation
description: '{{ doc("product_operation") }}'
- name: product_pricing_unit
description: '{{ doc("product_pricing_unit") }}'
- name: product_product_family
description: '{{ doc("product_product_family") }}'
- name: product_product_name
description: '{{ doc("product_product_name") }}'
- name: product_region_code
description: '{{ doc("product_region_code") }}'
- name: product_servicecode
description: '{{ doc("product_servicecode") }}'
- name: product_sku
description: '{{ doc("product_sku") }}'
- name: product_to_location
description: '{{ doc("product_to_location") }}'
- name: product_to_location_type
description: '{{ doc("product_to_location_type") }}'
- name: product_to_region_code
description: '{{ doc("product_to_region_code") }}'
- name: product_usagetype
description: '{{ doc("product_usagetype") }}'
- name: reservation_amortized_upfront_fee_for_billing_period
description: '{{ doc("reservation_amortized_upfront_fee_for_billing_period") }}'
- name: reservation_end_time
description: '{{ doc("reservation_end_time") }}'
- name: reservation_modification_status
description: '{{ doc("reservation_modification_status") }}'
- name: reservation_normalized_units_per_reservation
description: '{{ doc("reservation_normalized_units_per_reservation") }}'
- name: reservation_number_of_reservations
description: '{{ doc("reservation_number_of_reservations") }}'
- name: reservation_reservation_arn
description: '{{ doc("reservation_reservation_arn") }}'
- name: reservation_start_time
description: '{{ doc("reservation_start_time") }}'
- name: reservation_subscription_id
description: '{{ doc("reservation_subscription_id") }}'
- name: reservation_total_reserved_normalized_units
description: '{{ doc("reservation_total_reserved_normalized_units") }}'
- name: reservation_total_reserved_units
description: '{{ doc("reservation_total_reserved_units") }}'
- name: reservation_units_per_reservation
description: '{{ doc("reservation_units_per_reservation") }}'
- name: reservation_unused_amortized_upfront_fee_for_billing_period
description: '{{ doc("reservation_unused_amortized_upfront_fee_for_billing_period") }}'
- name: reservation_unused_normalized_unit_quantity
description: '{{ doc("reservation_unused_normalized_unit_quantity") }}'
- name: reservation_unused_quantity
description: '{{ doc("reservation_unused_quantity") }}'
- name: reservation_unused_recurring_fee
description: '{{ doc("reservation_unused_recurring_fee") }}'
- name: reservation_upfront_value
description: '{{ doc("reservation_upfront_value") }}'
- name: savings_plan_end_time
description: '{{ doc("savings_plan_end_time") }}'
- name: savings_plan_offering_type
description: '{{ doc("savings_plan_offering_type") }}'
- name: savings_plan_payment_option
description: '{{ doc("savings_plan_payment_option") }}'
- name: savings_plan_purchase_term
description: '{{ doc("savings_plan_purchase_term") }}'
- name: savings_plan_region
description: '{{ doc("savings_plan_region") }}'
- name: savings_plan_savings_plan_arn
description: '{{ doc("savings_plan_savings_plan_arn") }}'
- name: savings_plan_savings_plan_effective_cost
description: '{{ doc("savings_plan_savings_plan_effective_cost") }}'
- name: savings_plan_savings_plan_rate
description: '{{ doc("savings_plan_savings_plan_rate") }}'
- name: savings_plan_start_time
description: '{{ doc("savings_plan_start_time") }}'
- name: bill_payer_account_name
description: '{{ doc("bill_payer_account_name") }}'
- name: product
description: '{{ doc("product") }}'
- name: discount
description: '{{ doc("discount") }}'
- name: resource_tags
description: '{{ doc("resource_tags") }}'
- name: cost_category
description: '{{ doc("cost_category") }}'
- name: line_item_usage_account_name
description: '{{ doc("line_item_usage_account_name") }}'
- name: reservation_reservation_a_r_n
description: '{{ doc("reservation_reservation_a_r_n") }}'
- name: reservation_recurring_fee_for_usage
description: '{{ doc("reservation_recurring_fee_for_usage") }}'
- name: savings_plan_recurring_commitment_for_billing_period
description: '{{ doc("savings_plan_recurring_commitment_for_billing_period") }}'
- name: savings_plan_used_commitment
description: '{{ doc("savings_plan_used_commitment") }}'
- name: reservation_amortized_upfront_cost_for_usage
description: '{{ doc("reservation_amortized_upfront_cost_for_usage") }}'
- name: reservation_effective_cost
description: '{{ doc("reservation_effective_cost") }}'
- name: savings_plan_amortized_upfront_commitment_for_billing_period
description: '{{ doc("savings_plan_amortized_upfront_commitment_for_billing_period") }}'
- name: savings_plan_total_commitment_to_date
description: '{{ doc("savings_plan_total_commitment_to_date") }}'
- Set the
has_defined_sources
variable (scoped to theaws_cloud_cost
package) toTrue
, like such:
# dbt_project.yml
vars:
aws_cloud_cost:
has_defined_sources: true
(Optional) Step 4: Additional configurations
Limit Date Range
Although the package transforms the latest version of each report, your AWS Cost & Usage Report data may still be quite large. In order to avoid unnecessary compute and storage costs, we have added a minimum (INCLUSIVE) start date variable that can be used to limit the data's date range.
By default, the package will look at data as far back as you have it. To adjust this, configure the following variable in your dbt_project.yml
to be the first date you want included:
# dbt_project.yml
vars:
aws_cloud_cost_start_date: 'YYYY-MM-DD' # default value: '1970-01-01'
Note for BigQuery users: This filter applies a full table scan and may therefore not actually mitigate your compute costs. We have applied partitions to each end model, but you may want to consider applying pre-package transformations (partitions, filters, etc.) to streamline the amount of data processed by the package.
Please create an issue if you'd like to see support for incremental materializations.
Passing Through Additional Fields
This package includes all source columns defined in the macros folder. You can add more columns to the aws_cloud_cost__daily_overview
model using the aws_cloud_cost_report_pass_through_columns
variable. This variable allows for custom or otherwise not included fields to be included, aliased (alias
), and casted (transform_sql
) if desired (but not required). Datatype casting is configured via a sql snippet within the transform_sql
key. You may add the desired sql while omitting the as field_name
at the end and your custom pass-though fields will be casted accordingly.
Use the below format for declaring extra fields to include:
# dbt_project.yml
vars:
aws_cloud_cost_report_pass_through_columns: # will be included in aws_cloud_cost__daily_overview model
- name: "that_field"
alias: "renamed_to_this_field"
transform_sql: "cast(renamed_to_this_field as string)"
- name: "this_field"
- name: "old_name"
alias: "new_name"
These fields will be included in the aws_cloud_cost__daily_overview
model and part of its composite hashed unique_key
.
Please create an issue if you'd like to see passthrough column support for the
aws_cloud_cost__daily_product_report
oraws_cloud_cost__daily_instance_report
models.
Changing the Build Schema
By default this package will build the AWS Cloud Cost staging models within a schema titled (<target_schema> + _stg_aws_cloud_cost
) and the AWS Cloud Cost final models within a schema titled (<target_schema> + _aws_cloud_cost
) in your target database. If this is not where you would like your modeled AWS Cloud Cost data to be written to, add the following configuration to your dbt_project.yml
file:
# dbt_project.yml
models:
aws_cloud_cost:
+schema: my_new_schema_name # leave blank for just the target_schema
staging:
+schema: my_new_schema_name # leave blank for just the target_schema
(Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
Expand for details
Fivetran offers the ability for you to orchestrate your dbt project through Fivetran Transformations for dbt Core™. Learn how to set up your project for orchestration through Fivetran in our Transformations for dbt Core setup guides.
Does this package have dependencies?
This dbt package is dependent on the following dbt packages. These dependencies are installed by default within this package. For more information on the following packages, refer to the dbt hub site.
IMPORTANT: If you have any of these dependent packages in your own
packages.yml
file, we highly recommend that you remove them from your rootpackages.yml
to avoid package version conflicts.
packages:
- package: fivetran/fivetran_utils
version: [">=0.4.0", "<0.5.0"]
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]
How is this package maintained and can I contribute?
Package Maintenance
The Fivetran team maintaining this package only maintains the latest version of the package. We highly recommend you stay consistent with the latest version of the package and refer to the CHANGELOG and release notes for more information on changes across versions.
Contributions
A small team of analytics engineers at Fivetran develops these dbt packages. However, the packages are made better by community contributions.
We highly encourage and welcome contributions to this package. Check out this dbt Discourse article on the best workflow for contributing to a package.
Are there any resources available?
- If you have questions or want to reach out for help, see the GitHub Issue section to find the right avenue of support for you.
- If you would like to provide feedback to the dbt package team at Fivetran or would like to request a new dbt package, fill out our Feedback Form.