These data management best practices apply to both large enterprises and fast-growing startups.
I started my career as an options trader at J.P. Morgan and then moved on to build their first modern data stack. Currently, I am a principal analytics leader at Fivetran, an up-and-coming market leader in data integration. Along the way, I’ve had to learn how to build complex information systems the hard way. Today, I want to share some best practices I’ve picked up that will help you succeed.
The modern data stack is a set of cloud-native data tools centered around automation, lowered costs, and ease-of-use to end users throughout the lifecycle of data management. Here is my framework for building a modern data stack:
Data warehouse – First, set up a cloud-based data warehouse. Different data warehouses will offer different scalability, pricing models, dialects of SQL, and other features. Notable examples of cloud-native data warehouses include BigQuery, Snowflake and Redshift.
Business intelligence – Then, connect a cloud-native BI tool with your data warehouse. Different BI tools offer varying levels of visualization power, user-friendliness, collaboration, and other features. Notable examples of BI tools include Looker, Tableau, Qlik, and Mode.
Data pipeline – You will need a tool to extract data from your applications and operational systems and load it to the central data warehouse. Different pipeline vendors have different approaches to ease-of-use, configurability, security, and customer service. Examples include Fivetran, Stitch, Xplenty, and Matillion.
Data transformation – Finally, you will need tools to transform data into models for reporting and predictive modeling. Many data pipelines include transformation tools, but a good standalone example is dbt.
Most vendors offer free trials. You can also consult industry publications like Gartner for more detailed comparisons.
One hard problem we’ve experienced at Fivetran as we grow is “how to hire good talent fast.” Having hired over a dozen talented data professionals last year, I want to share a proven framework that will help you level up:
First Screen: test for advanced technical skills, particularly advanced SQL skill. You can use testing software such as HackerRank to increase screening efficiency.
Second Screen: test for execution skill. If this is the first hire, have the candidates write a 30-60-90 day plan and evaluate their business strategy. If this is not the first hire, have them work through existing code featuring business logic. This is a practical test that mimics their typical duties on the job.
Next, screen for analytical skill: Ask your candidates to present insights and visualizations from a complex dataset. Alternatively, use a timeboxed case study involving a relevant use case to evaluate their analytical skill. This will test your candidate’s ability to analyze data and articulate their ideas.
Superb cultural fit is a must: We have consistently improved talent density by pursuing cultural fit as alignment with company values. Prioritize new hires whose traits and abilities complement your team.
When in doubt, make reference calls. Investing an hour upfront can prevent you from making a costly hiring mistake.
Last but not least, we heavily lean on networking and referrals to improve hiring speed and quality. Rallying your employees for referrals and pairing this with referral incentives goes a long way. Data communities such as dbt, Locally Optimistic, and Outer Join are also excellent sources for recruiting.
Your first six months are crucial to setting the stage for your company’s analytics efforts. Here is a framework that will help you build a solid foundation for early success.
Design a centralized data team: A centralized team with a “hub-and-spoke” structure is a superior model for the majority of companies, because it helps align strategy and execution. The analytics team (hub) should report directly to the CEO or a technical executive, and pods (spokes) that specialize in particular business domains should be functionally aligned with their respective departments. This model has worked well in both J.P. Morgan where the team must support businesses on a massive scale, and also at Fivetran where the company needs to scale aggressively.
Work with peer teams: First, identify other teams that are already using analytics, and how. Build alliances by helping them automate their data integration and avoid duplicate work. Second, determine the team’s scope, particularly identifying tasks that are out of scope to improve focus and execution.
Align foundational metrics with the leadership: The CEO should ensure the BI layer is an integral part of business strategy because “what gets measured gets managed.” Here is a simple framework of important early KPIs for a SaaS company:
Annual recurring revenue (ARR)
Net revenue retention (NRR)
Unit economics: e.g. customer acquisition cost, sales efficiency
Sales and Marketing
Customers growth and churn rate
Month over month revenue growth
Marketing qualified lead and conversion metrics
Daily, weekly, monthly active users
Net promoter score
Traditionally, a data team is simply viewed as a support team, an engineering team, or increasingly a product team. However, I believe that a data team is a combination of the three.
Build with a product lens: Erik Jones, Director of Analytics at New Relic, summarizes this aspect brilliantly: “A successful analytics team must also be good at gathering requirements, defining scope, managing expectations, marketing and roll-out, training end-users, and ultimately driving adoption of what is being built.” The data team should center around enabling self-service. I recommend using usage adoption and NPS as North Star metrics for the team.
Operate with engineering principles: A successful analytics team should also invest at least 25% of resources in building an easily navigable and scalable data infrastructure. Leveraging these engineering principles will improve operational efficiency: user requests logs, bi-weekly sprints, implementation of code review and quality assurance (QA) process, continuous automation, and extensive documentation.
Service with customer-centric mentality: The analytics team should build in a technical customer success function that provides onboarding and ongoing support, responds to issues escalated by users, works with partner teams to resolve production issues, and develops end user training materials. These functions can be assigned to dedicated technical customer success roles as the team scales.
Approach your data modernization efforts with curiosity and patience. In order to make impactful decisions, the key is to understand the organization’s business model, constraints and incentives.
Enterprise data is complex and chaotic. Since J.P. Morgan has gone through many mergers and acquisitions over the past century, it has had to integrate many different systems. This complexity is amplified through the lifecycle of data management: analysis and reconciliation work, system and data integration, data monitoring and governance, standardization across businesses, complex permissioning, and foundational performance issues involved in moving, transforming, and interacting with petabytes of data.
Use systems thinking to discover the right problem to solve. At J.P. Morgan, I stumbled upon success while managing data for the financing business, which handles over $1 trillion worth of capital. Sales people, traders, and more ceaselessly made data requests. However, most of these requests were time-consuming to execute, involved small, local optimizations, and made only limited impacts. I discovered that strategically placing trades on a macro level to match assets and liabilities would potentially free up hundreds of millions dollars worth of capital, increasing the firm’s competitive edge. Such actions were beyond the mandate of individual teams, because no single team could access all of the data.
Our first task was to break down silos and centralize the data. We then explored machine learning algorithms to analyze how to allocate capital to clients to maximize ROI. Because it was extremely complex to automate the production of any system-level metric in a large enterprise, both providing real-time visibility to the macro metric and optimizing for it on a global scale proved limitless value.
As a data professional, I enjoy designing an elegant information system and believe that the right use of data can help humanity achieve a higher level of consciousness and affect changes. While many people may feel triumphant about the significant technological advances in data management so far, I believe this is just the beginning of a data revolution.
A version of this blog post was previously published on Towards Data Science.