3 Kinds of Metrics GitHub Data Can Answer | Blog

If you are a small but growing tech company, chances are good that you use GitHub for collaboration and version control. Fivetran offers a GitHub connector to help you quickly gather data about issues, pull requests and commits. We also offer a dbt package to transform your data into more tractable, analytics-ready models.

See the most current version of the ERD here.

GitHub data can help you track the following:

Types of engineering work
Productivity and velocity
Bug resolution

To glean insights from issues, pull requests, and commits, you will first need to identify the right metrics. This may require transforming the normalized schema provided through the GitHub connector. Then, you will need to represent those data models on dashboards or visualizations.

1. Types of engineering work

One of the fundamental tradeoffs your engineering team must contend with is splitting time between maintaining the stability of the product and creating new features.

On one hand, a product that does not evolve new features will eventually be supplanted by competitors that do. On the other hand, a product that fails to perform as promised will lose its user base and hurt your brand.

The questions to ask and metrics to answer them

You will have to look outside of GitHub to see whether customers immediately prefer better performance or new features, but GitHub data can tell you how you’re apportioning your resources.

1. How many issues are opened for bugs, code hygiene and new features?

If you aren’t in the habit of labeling your issues by category, you should do so. Group your issues by label and determine the breakdown of your work.

Tables and fields

Github.issue - id, created_at
Github.issue_label - issue_id, label

Arrange this data by time to observe any trends and compare them with other happenings, i.e. customer feedback.

2. How much work do you dedicate to each category?

You can estimate effort by attributing story points or a similar metric to each issue. Group by label to see which categories get the most attention.

Tables and fields

Github.issue table - created_at, closed_at fields
Github.issue_assignees - issue_id, user_id fields
Github.issue_label - issue_id, label

3. Do some of your engineers specialize in certain categories?

Are there any teams or engineers who seem to specialize in a particular category? Group by team or assignee and look at both raw numbers and percentages by label to find out.

Tables and fields

Github.issue_label - issue_id, label
Github.issue_assignees - issue_id, user_id (can join this to github.user to get user.login)

2. Does the time elapsed influence the chances of the pull request being closed?

There is a good chance you will notice that, beyond some interval of time, the chances of a pull request ever being successfully closed drop precipitously.

Tables and fields used

Github.pull_request - id
Github.pull_request_review - pull_request_id, submitted_at
Github.issue - created_at, closed_at (all pull requests are in the issue table with pull_request = TRUE)

3. Do different engineers review and close pull requests at different rates?

Certain engineers are better at reviewing pull requests than others. It might be worth identifying and assigning the most efficient reviewers.

Bin the data by reviewer and determine the average elapsed time to review.

Tables and fields used

Github.pull_request_review - user_id (this is the reviewer), pull_request_id, submitted_at
Github.requested_reviewer_history - pull_request_id, requested_id (this is the reviewer user_id), requested_reviewer_history.created_at

Next steps

As you can see, there are many trackable points within GitHub data, and the data can be further enriched by blending it with other data sources as well. For instance, you can measure product health data by finding which features of your product are responsible for the most bug fixes, then further validate your findings by examining data from Zendesk or another customer support data source.

In addition to the data already available through the Fivetran ERD, make sure to explore the data from the dbt package. It includes tables designed to streamline your analytics efforts

Good luck!

Three kinds of metrics GitHub data can answer

1. Types of engineering work

The questions to ask and metrics to answer them

1. How many issues are opened for bugs, code hygiene and new features?

2. How much work do you dedicate to each category?

3. Do some of your engineers specialize in certain categories?

2. Does the time elapsed influence the chances of the pull request being closed?

3. Do different engineers review and close pull requests at different rates?

Next steps

Start for free