Data from GitHub can help your engineers quantify how they are doing.
If you are a small but growing tech company, chances are good that you use GitHub for collaboration and version control. Fivetran offers a GitHub connector to help you quickly gather data about issues, pull requests and commits. We also offer a dbt package to transform your data into more tractable, analytics-ready models.
See the most current version of the ERD here.
GitHub data can help you track the following:
Types of engineering work
Productivity and velocity
Bug resolution
To glean insights from issues, pull requests, and commits, you will first need to identify the right metrics. This may require transforming the normalized schema provided through the GitHub connector. Then, you will need to represent those data models on dashboards or visualizations.
One of the fundamental tradeoffs your engineering team must contend with is splitting time between maintaining the stability of the product and creating new features.
On one hand, a product that does not evolve new features will eventually be supplanted by competitors that do. On the other hand, a product that fails to perform as promised will lose its user base and hurt your brand.
You will have to look outside of GitHub to see whether customers immediately prefer better performance or new features, but GitHub data can tell you how you’re apportioning your resources.
If you aren’t in the habit of labeling your issues by category, you should do so. Group your issues by label and determine the breakdown of your work.
Tables and fields
Github.issue - id, created_at
Github.issue_label - issue_id, label
Arrange this data by time to observe any trends and compare them with other happenings, i.e. customer feedback.
You can estimate effort by attributing story points or a similar metric to each issue. Group by label to see which categories get the most attention.
Tables and fields
Github.issue table - created_at, closed_at fields
Github.issue_assignees - issue_id, user_id fields
Github.issue_label - issue_id, label
Are there any teams or engineers who seem to specialize in a particular category? Group by team or assignee and look at both raw numbers and percentages by label to find out.
Tables and fields
Github.issue_label - issue_id, label
Github.issue_assignees - issue_id, user_id (can join this to github.user to get user.login)
There is a good chance you will notice that, beyond some interval of time, the chances of a pull request ever being successfully closed drop precipitously.
Tables and fields used
Github.pull_request - id
Github.pull_request_review - pull_request_id, submitted_at
Github.issue - created_at, closed_at (all pull requests are in the issue table with pull_request = TRUE)
Certain engineers are better at reviewing pull requests than others. It might be worth identifying and assigning the most efficient reviewers.
Bin the data by reviewer and determine the average elapsed time to review.
Tables and fields used
Github.pull_request_review - user_id (this is the reviewer), pull_request_id, submitted_at
Github.requested_reviewer_history - pull_request_id, requested_id (this is the reviewer user_id), requested_reviewer_history.created_at
As you can see, there are many trackable points within GitHub data, and the data can be further enriched by blending it with other data sources as well. For instance, you can measure product health data by finding which features of your product are responsible for the most bug fixes, then further validate your findings by examining data from Zendesk or another customer support data source.
In addition to the data already available through the Fivetran ERD, make sure to explore the data from the dbt package. It includes tables designed to streamline your analytics efforts
Good luck!
Launch any Fivetran connector instantly.