AWS Cloud Cost dbt Package (Docs)
What does this dbt package do?
This package models AWS Cloud Cost data from Fivetran's AWS Cost Report connector. It uses data in the format described by the AWS Cost & Usage Report.
The main focus of the package is to transform the core object tables into analytics-ready models, including:
- Materializes AWS Cloud Cost staging tables which leverage data in the format described by the AWS Cost & Usage Report. These staging tables clean, test, and prepare your AWS Cloud Cost data from Fivetran's connector for analysis by doing the following:
- Names columns for consistency across all packages and for easier analysis:
- Column names are shortened for convenience and to avoid redundancy.
- Takes the latest export of each account's billing month report.
- Names columns for consistency across all packages and for easier analysis:
- Creates analytics-ready end models for monitoring and investigating cost & usage of different AWS services across your organizations.
- Generates a comprehensive data dictionary of both your AWS Cloud Cost source and modeled data through the dbt docs site.
This package does not apply freshness tests.
The following table provides a detailed list of all tables materialized within this package by default.
TIP: See more details about these tables in the package's dbt docs site.
| Table | Description |
|---|---|
| aws_cloud_cost__daily_overview | Daily aggregation of the Standard Cost & Usage Report (2.0) exported from AWS. Includes slew of commonly analyzed dimensions related to billing, pricing, line item buckets, and products. Contains both high-level cost and usage metrics, along with all metrics related to reservations and savings plans. Also includes financial reporting fieds relating to invoices and billing periods. |
| aws_cloud_cost__daily_product_report | Daily view of each account's use of and associated costs from individual AWS products for each billing period. Built off of the daily overview model. |
| aws_cloud_cost__daily_instance_report | Daily view of each account's use of and associated costs from different Amazon Elastic Compute Cloud (EC2) instances for each billing period. Built off of the daily overview model. |
Materialized Models
Each Quickstart transformation job run materializes 5 models if all components of this data model are enabled. This count includes all staging, intermediate, and final models materialized as view, table, or incremental.
How do I use the dbt package?
DISCLAIMER: This package transforms source data of potentially very high volumes. Please be aware of the size of your dataset(s) and take this into consideration when configuring the frequency with which you will orchestrate the package models. See Step 4 for tools to mitigate compute and storage costs.
Step 1: Prerequisites
To use this dbt package, you must have the following:
- At least one Fivetran AWS Cloud Cost connection syncing data into your destination.
- A BigQuery, Snowflake, Redshift, Databricks, or PostgreSQL destination.
Databricks dispatch configuration
If you are using a Databricks destination with this package, you must add the following (or a variation of the following) dispatch configuration within your dbt_project.yml. This is required in order for the package to accurately search for macros within the dbt-labs/spark_utils then the dbt-labs/dbt_utils packages respectively.
dispatch: - macro_namespace: dbt_utils search_order: ['spark_utils', 'dbt_utils']
Step 2: Install the package
Include the following AWS Cloud Cost package version in your packages.yml file:
TIP: Check dbt Hub for the latest installation instructions or read the dbt docs for more information on installing packages.
packages: - package: fivetran/aws_cloud_cost version: [">=0.1.0", "<0.2.0"] # we recommend using ranges to capture non-breaking changes automatically
Step 3: Define database, schema, and table name variables
Option A: Single connection
By default, this package assumes your AWS Cost & Usage Report data lives in the following location:
- Your
target.database - A schema called
aws_cloud_cost - A table called
aws_cloud_cost_report
In the very likely case that your AWS Cloud Cost source data lives someplace else (for example, if your AWS Cloud Cost schema is named aws_cloud_cost_fivetran or your table is called aws_billing), add the following configuration to your root dbt_project.yml file:
# dbt_project.yml vars: aws_cloud_cost_database: your_database_name # default: target.database aws_cloud_cost_schema: your_schema_name # default: aws_cloud_cost aws_cloud_cost_report_identifier: your_table_name # default: aws_cloud_cost_report
Option B: Union multiple connections
If you have multiple AWS Cloud Cost connections in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. The package will union all of the data together and pass the unioned table into the transformations. You will be able to see which source it came from (the database.schema.table, NOT the source name) in the source_relation column of each model.
To use this functionality, you will need to configure the aws_cloud_cost_sources dictionary-list in your root dbt_project.yml file. For each source, provide the appropriate database, schema, and table for each dataset:
# dbt_project.yml vars: aws_cloud_cost_sources: - database: source_databse_name # default: target.database schema: source_schema_name table: source_table_name - database: 'my-db-example' schema: aws_cost_schema_example table: report_table_example # include as many sources as you'd like
Recommended: Incorporate unioned sources into DAG
NOTE: The native
aws_cloud_costsource connection set up in the package will not function when the union-feature is utilized. Although the package will run correctly and the data will be correctly transformed, you will not observe the sources linked to the package models in the Directed Acyclic Graph (DAG).
To properly incorporate all of your AWS Cloud Cost connections into your project's DAG:
- For each source provided to the
aws_cloud_cost_sourcesvariable, you must now add a uniquenameattribute. This can be any name, so long as it is unique and matches thesource.nameyou define in the following step.
# dbt_project.yml vars: aws_cloud_cost_sources: - database: source_databse_name schema: source_schema_name table: source_table_name name: unique_source_name # NOW REQUIRED - can choose any name so long as it is unique - database: 'my-db-example' schema: aws_cost_schema_example table: report_table_example name: my_aws_cost_report_source # include as many sources as you'd like
- Define each source provided to the
aws_cloud_cost_sourcesvariable in a.ymlfile in your root project'smodels/pathway. Utilize the following template for thesource-level configurations, and, most importantly, copy and paste the table and column-level definitions:
Expand for source template
# a .yml file in your root project sources: - name:# Must map onto name in var(aws_cloud_cost_sources) you added in the previous step schema: # Must map onto schema in var(aws_cloud_cost_sources) database: # Must map onto database in var(aws_cloud_cost_sources) loader: fivetran loaded_at_field: _fivetran_synced tables: - name: # Must map onto table in var(aws_cloud_cost_sources) description: '{{ doc("aws_cloud_cost_report") }}' # Your projecy will inherit docs blocks defined by this package columns: &aws_report_columns # Can use columns: *aws_report yaml anchor in subsequent sources - name: _file description: '{{ doc("_file") }}' - name: _line description: '{{ doc("_line") }}' - name: _fivetran_synced description: '{{ doc("_fivetran_synced") }}' - name: _modified description: '{{ doc("_modified") }}' - name: bill_bill_type description: '{{ doc("bill_bill_type") }}' - name: bill_billing_entity description: '{{ doc("bill_billing_entity") }}' - name: bill_billing_period_end_date description: '{{ doc("bill_billing_period_end_date") }}' - name: bill_billing_period_start_date description: '{{ doc("bill_billing_period_start_date") }}' - name: bill_invoice_id description: '{{ doc("bill_invoice_id") }}' - name: bill_invoicing_entity description: '{{ doc("bill_invoicing_entity") }}' - name: bill_payer_account_id description: '{{ doc("bill_payer_account_id") }}' - name: identity_line_item_id description: '{{ doc("identity_line_item_id") }}' - name: identity_time_interval description: '{{ doc("identity_time_interval") }}' - name: line_item_availability_zone description: '{{ doc("line_item_availability_zone") }}' - name: line_item_blended_cost description: '{{ doc("line_item_blended_cost") }}' - name: line_item_blended_rate description: '{{ doc("line_item_blended_rate") }}' - name: line_item_currency_code description: '{{ doc("line_item_currency_code") }}' - name: line_item_legal_entity description: '{{ doc("line_item_legal_entity") }}' - name: line_item_line_item_description description: '{{ doc("line_item_line_item_description") }}' - name: line_item_line_item_type description: '{{ doc("line_item_line_item_type") }}' - name: line_item_normalization_factor description: '{{ doc("line_item_normalization_factor") }}' - name: line_item_normalized_usage_amount description: '{{ doc("line_item_normalized_usage_amount") }}' - name: line_item_operation description: '{{ doc("line_item_operation") }}' - name: line_item_product_code description: '{{ doc("line_item_product_code") }}' - name: line_item_resource_id description: '{{ doc("line_item_resource_id") }}' - name: line_item_tax_type description: '{{ doc("line_item_tax_type") }}' - name: line_item_unblended_cost description: '{{ doc("line_item_unblended_cost") }}' - name: line_item_unblended_rate description: '{{ doc("line_item_unblended_rate") }}' - name: line_item_usage_account_id description: '{{ doc("line_item_usage_account_id") }}' - name: line_item_usage_amount description: '{{ doc("line_item_usage_amount") }}' - name: line_item_usage_end_date description: '{{ doc("line_item_usage_end_date") }}' - name: line_item_usage_start_date description: '{{ doc("line_item_usage_start_date") }}' - name: line_item_usage_type description: '{{ doc("line_item_usage_type") }}' - name: pricing_currency description: '{{ doc("pricing_currency") }}' - name: pricing_lease_contract_length description: '{{ doc("pricing_lease_contract_length") }}' - name: pricing_offering_class description: '{{ doc("pricing_offering_class") }}' - name: pricing_public_on_demand_cost description: '{{ doc("pricing_public_on_demand_cost") }}' - name: pricing_public_on_demand_rate description: '{{ doc("pricing_public_on_demand_rate") }}' - name: pricing_purchase_option description: '{{ doc("pricing_purchase_option") }}' - name: pricing_rate_code description: '{{ doc("pricing_rate_code") }}' - name: pricing_rate_id description: '{{ doc("pricing_rate_id") }}' - name: pricing_term description: '{{ doc("pricing_term") }}' - name: pricing_unit description: '{{ doc("pricing_unit") }}' - name: product_fee_code description: '{{ doc("product_fee_code") }}' - name: product_fee_description description: '{{ doc("product_fee_description") }}' - name: product_from_location description: '{{ doc("product_from_location") }}' - name: product_from_location_type description: '{{ doc("product_from_location_type") }}' - name: product_from_region_code description: '{{ doc("product_from_region_code") }}' - name: product_instance_family description: '{{ doc("product_instance_family") }}' - name: product_instance_type description: '{{ doc("product_instance_type") }}' - name: product_location description: '{{ doc("product_location") }}' - name: product_location_type description: '{{ doc("product_location_type") }}' - name: product_operation description: '{{ doc("product_operation") }}' - name: product_pricing_unit description: '{{ doc("product_pricing_unit") }}' - name: product_product_family description: '{{ doc("product_product_family") }}' - name: product_product_name description: '{{ doc("product_product_name") }}' - name: product_region_code description: '{{ doc("product_region_code") }}' - name: product_servicecode description: '{{ doc("product_servicecode") }}' - name: product_sku description: '{{ doc("product_sku") }}' - name: product_to_location description: '{{ doc("product_to_location") }}' - name: product_to_location_type description: '{{ doc("product_to_location_type") }}' - name: product_to_region_code description: '{{ doc("product_to_region_code") }}' - name: product_usagetype description: '{{ doc("product_usagetype") }}' - name: reservation_amortized_upfront_fee_for_billing_period description: '{{ doc("reservation_amortized_upfront_fee_for_billing_period") }}' - name: reservation_end_time description: '{{ doc("reservation_end_time") }}' - name: reservation_modification_status description: '{{ doc("reservation_modification_status") }}' - name: reservation_normalized_units_per_reservation description: '{{ doc("reservation_normalized_units_per_reservation") }}' - name: reservation_number_of_reservations description: '{{ doc("reservation_number_of_reservations") }}' - name: reservation_reservation_arn description: '{{ doc("reservation_reservation_arn") }}' - name: reservation_start_time description: '{{ doc("reservation_start_time") }}' - name: reservation_subscription_id description: '{{ doc("reservation_subscription_id") }}' - name: reservation_total_reserved_normalized_units description: '{{ doc("reservation_total_reserved_normalized_units") }}' - name: reservation_total_reserved_units description: '{{ doc("reservation_total_reserved_units") }}' - name: reservation_units_per_reservation description: '{{ doc("reservation_units_per_reservation") }}' - name: reservation_unused_amortized_upfront_fee_for_billing_period description: '{{ doc("reservation_unused_amortized_upfront_fee_for_billing_period") }}' - name: reservation_unused_normalized_unit_quantity description: '{{ doc("reservation_unused_normalized_unit_quantity") }}' - name: reservation_unused_quantity description: '{{ doc("reservation_unused_quantity") }}' - name: reservation_unused_recurring_fee description: '{{ doc("reservation_unused_recurring_fee") }}' - name: reservation_upfront_value description: '{{ doc("reservation_upfront_value") }}' - name: savings_plan_end_time description: '{{ doc("savings_plan_end_time") }}' - name: savings_plan_offering_type description: '{{ doc("savings_plan_offering_type") }}' - name: savings_plan_payment_option description: '{{ doc("savings_plan_payment_option") }}' - name: savings_plan_purchase_term description: '{{ doc("savings_plan_purchase_term") }}' - name: savings_plan_region description: '{{ doc("savings_plan_region") }}' - name: savings_plan_savings_plan_arn description: '{{ doc("savings_plan_savings_plan_arn") }}' - name: savings_plan_savings_plan_effective_cost description: '{{ doc("savings_plan_savings_plan_effective_cost") }}' - name: savings_plan_savings_plan_rate description: '{{ doc("savings_plan_savings_plan_rate") }}' - name: savings_plan_start_time description: '{{ doc("savings_plan_start_time") }}' - name: bill_payer_account_name description: '{{ doc("bill_payer_account_name") }}' - name: product description: '{{ doc("product") }}' - name: discount description: '{{ doc("discount") }}' - name: resource_tags description: '{{ doc("resource_tags") }}' - name: cost_category description: '{{ doc("cost_category") }}' - name: line_item_usage_account_name description: '{{ doc("line_item_usage_account_name") }}' - name: reservation_reservation_a_r_n description: '{{ doc("reservation_reservation_a_r_n") }}' - name: reservation_recurring_fee_for_usage description: '{{ doc("reservation_recurring_fee_for_usage") }}' - name: savings_plan_recurring_commitment_for_billing_period description: '{{ doc("savings_plan_recurring_commitment_for_billing_period") }}' - name: savings_plan_used_commitment description: '{{ doc("savings_plan_used_commitment") }}' - name: reservation_amortized_upfront_cost_for_usage description: '{{ doc("reservation_amortized_upfront_cost_for_usage") }}' - name: reservation_effective_cost description: '{{ doc("reservation_effective_cost") }}' - name: savings_plan_amortized_upfront_commitment_for_billing_period description: '{{ doc("savings_plan_amortized_upfront_commitment_for_billing_period") }}' - name: savings_plan_total_commitment_to_date description: '{{ doc("savings_plan_total_commitment_to_date") }}'
- Set the
has_defined_sourcesvariable (scoped to theaws_cloud_costpackage) toTrue, like such:
# dbt_project.yml vars: aws_cloud_cost: has_defined_sources: true
(Optional) Step 4: Additional configurations
Limit Date Range
Although the package transforms the latest version of each report, your AWS Cost & Usage Report data may still be quite large. In order to avoid unnecessary compute and storage costs, we have added a minimum (INCLUSIVE) start date variable that can be used to limit the data's date range.
By default, the package will look at data as far back as you have it. To adjust this, configure the following variable in your dbt_project.yml to be the first date you want included:
# dbt_project.yml vars: aws_cloud_cost_start_date: 'YYYY-MM-DD' # default value: '1970-01-01'
Note for BigQuery users: This filter applies a full table scan and may therefore not actually mitigate your compute costs. We have applied partitions to each end model, but you may want to consider applying pre-package transformations (partitions, filters, etc.) to streamline the amount of data processed by the package.
Please create an issue if you'd like to see support for incremental materializations.
Passing Through Additional Fields
This package includes all source columns defined in the macros folder. You can add more columns to the aws_cloud_cost__daily_overview model using the aws_cloud_cost_report_pass_through_columns variable. This variable allows for custom or otherwise not included fields to be included, aliased (alias), and casted (transform_sql) if desired (but not required). Datatype casting is configured via a sql snippet within the transform_sql key. You may add the desired sql while omitting the as field_name at the end and your custom pass-though fields will be casted accordingly.
Use the below format for declaring extra fields to include:
# dbt_project.yml vars: aws_cloud_cost_report_pass_through_columns: # will be included in aws_cloud_cost__daily_overview model - name: "that_field" alias: "renamed_to_this_field" transform_sql: "cast(renamed_to_this_field as string)" - name: "this_field" - name: "old_name" alias: "new_name"
These fields will be included in the aws_cloud_cost__daily_overview model and part of its composite hashed unique_key.
Please create an issue if you'd like to see passthrough column support for the
aws_cloud_cost__daily_product_reportoraws_cloud_cost__daily_instance_reportmodels.
Changing the Build Schema
By default this package will build the AWS Cloud Cost staging models within a schema titled (<target_schema> + _stg_aws_cloud_cost) and the AWS Cloud Cost final models within a schema titled (<target_schema> + _aws_cloud_cost) in your target database. If this is not where you would like your modeled AWS Cloud Cost data to be written to, add the following configuration to your dbt_project.yml file:
# dbt_project.yml models: aws_cloud_cost: +schema: my_new_schema_name # leave blank for just the target_schema staging: +schema: my_new_schema_name # leave blank for just the target_schema
(Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
Expand for details
Fivetran offers the ability for you to orchestrate your dbt project through Fivetran Transformations for dbt Core™. Learn how to set up your project for orchestration through Fivetran in our Transformations for dbt Core setup guides.
Does this package have dependencies?
This dbt package is dependent on the following dbt packages. These dependencies are installed by default within this package. For more information on the following packages, refer to the dbt hub site.
IMPORTANT: If you have any of these dependent packages in your own
packages.ymlfile, we highly recommend that you remove them from your rootpackages.ymlto avoid package version conflicts.
packages: - package: fivetran/fivetran_utils version: [">=0.4.0", "<0.5.0"] - package: dbt-labs/dbt_utils version: [">=1.0.0", "<2.0.0"]
How is this package maintained and can I contribute?
Package Maintenance
The Fivetran team maintaining this package only maintains the latest version of the package. We highly recommend you stay consistent with the latest version of the package and refer to the CHANGELOG and release notes for more information on changes across versions.
Contributions
A small team of analytics engineers at Fivetran develops these dbt packages. However, the packages are made better by community contributions.
We highly encourage and welcome contributions to this package. Check out this dbt Discourse article on the best workflow for contributing to a package.
Are there any resources available?
- If you have questions or want to reach out for help, see the GitHub Issue section to find the right avenue of support for you.
- If you would like to provide feedback to the dbt package team at Fivetran or would like to request a new dbt package, fill out our Feedback Form.