For an overview of how dbt powers advanced transformations, and information about our other dbt packages, take a look at this recent blog.
Dbt package for GitHub
Our dbt package for GitHub helps you to better track the state of issues, pull requests and their related assignments in order to increase velocity for codebase updates. The packages make use of the Fivetran Github connector which enables the package to directly ingest all of the data passed through the GitHub API to join these disparate tables for:
- Enriching GitHub issues with their assignees and time to completion
- Time metrics attached to pull requests to track life cycles from creation, to review, to merge
- A weekly, monthly, and quarterly overview of your opened and closed issues and pull requests
The modeling and transformation package’s outputs can help organizations solve for common engineering challenges, such as:
- Whether there’s a disproportionate issue to assignee ratio
- Establishing a potential “cliff” timeline for a pull request to fall through the cracks
- High-level pull request completion tracking, by week, month and quarter.
- Determining the average time taken in each stage of a pull request to forecast an issue completion timeline
Challenges of the GitHub API
The GitHub API splits out contextual information about issues and pull requests, such as assignees and history, into various endpoints, which makes it easier to define your API requests to target the exact information that you’re looking for, but harder to join the data for analytics.
How Fivetran helps
Our native GitHub connector automatically brings in data about issues, pull requests and their corresponding contextual information in a pre-defined format (see Fivetran’s documentation for the GitHub schema) that makes it easy to start querying your data right away. By continuing to replicate your GitHub data into your centralized data warehouse at a frequency that you dictate, and using the provided dbt package, you’ll be able to better track and optimize your development team’s efficiency. Use GitHub as a standalone source or combine this data with common project tracking software, such as Jira or Asana, to provide your organization insight into the complete software delivery process.
Next steps
Get the dbt package for GitHub: This does advanced modeling, i.e., data transformations, dependencies, and target table creation. The primary outputs of this package are described below. Intermediate models are used to create these output models:
- github_Issues: Each record represents a GitHub issue, enriched with data about its assignees, milestones, and time comparisons
- github_pull requests: Each record represents a GitHub pull request, enriched with data about its repository, reviewers, and durations between review requests, merges and reviews
- github metrics: Each record represents enriched metrics about PRs and issues that were created and closed during day, week, month, or quarter periods.
Note, this dbt package is dependent upon the dbt source package for GitHub. The source package will automatically download when you download the dbt package for GitHub. The source package is for lightly cleansing the data, defining tables and columns, and testing your source data.