Meet the data leader whose team is redefining impact for over 11,000 analysts.
“To build all of the data sources and models would have taken us two to three months. But finding that Fivetran has a connector and bringing those 200 tables all down into Snowflake and having them available — we were able to accommodate that in four days.” — Richie Bachala, Global Data Engineering Leader
Fivetran cuts data ingestion to Snowflake from about three months to four days.
Modern “smart pipeline” tools enable agile workflows and rapidly increase speed-to-market for the 155-year-old paint and coating manufacturing giant.
Snowflake enables data elasticity, giving the data engineering team the ability to scale up and down rapidly based on team needs.
Fivetran’s automatic schema drift handling saves valuable engineering time, allowing the team to focus on driving business impact.
Business Intelligence Platform: Tableau
Cloud Platform: Microsoft Azure
For over a decade, Richie Bachala has focused on solving the biggest challenges for Sherwin-Williams with smart, company-wide solutions.
“Our primary responsibility at Sherwin-Williams is to provide data-as-a-service that is easy to consume, secure and elastic. We're using Snowflake and Azure in the cloud and various tools that go along with it,” Richie shared with Kelly Kohlleffel, Hashmap’s VP of Go-to-Market. Hashmap is a consulting services business in the data cloud space that helps clients with their biggest big data challenges, across multiple industries.
Richie’s journey at Sherwin-Williams started in late 2011. As Global Data Integration Developer, Richie was responsible for middleware implementations for Latin America ERP data sources — ensuring that the localized transactional data for newly-acquired companies was reconciled securely to corporate on a daily basis. Richie was ultimately responsible for the integration and reconciliation of over $210M of transactional data, integrating over 200+ siloed stores in Mexico, Chile, Peru and Ecuador.
“That's how I started my Sherwin-Williams journey, and it’s been a phenomenal ride. Data gave me the opportunity to travel most of Latin America and meet some amazing people.”
Today, Richie leads the Global Data Engineering team for the Global Supply Chain division at Sherwin-Williams’ headquarters in Cleveland, Ohio. Since moving to the U.S, Richie has focused on the data and analytics space, developing the company’s traditional BI tools, shepherding Sherwin-Williams from on-premise to the cloud, and driving widespread adoption of the Modern Data Stack, inclusive of Fivetran and Snowflake. Today, Richie’s team is based out of five countries, servicing the Sherwin-Willians user base in APAC, EMEA, LATAM and North America — and supporting over 11,000 internal data consumers for their data and analytics needs.
For Richie’s global analytics function, ultimate success looks a little different from other data teams.
“To be honest, it's really not about data or analytics,” says Richie. “You know, it's not even about insights — it's about impact. It’s not about what tools we are using, how frequently the real-time datasets are getting refreshed. If you're not making an impact then all of these activities are second to nothing, right? So, how do you define impact?”
When it comes to team impact, Richie categorizes functions into three three major categories, or pillars. These north stars guide the team’s operations:
Building data pipelines: The data movement piece — the wrangling, ingestion and transformation of the enterprise’s source data, whether it is structured, semi-structured or unstructured. Due to the large volume of sources and use cases, Richie’s team uses both in-house tooling for pipelines and Fivetran. “Fivetran is a great example of a tool that helps us manage SaaS data ingestion, it really fits well from a SaaS perspective.”
Building and managing data models: Building enterprise data warehouses, data modelling and a semantic layer that makes data more accessible, readable and shareable for their internal user base, allowing users to build reports and dashboards.
Making analytics and reporting accessible with cloud migration: The cloud migration journey, or modernization allows for future-proofing the enterprise's data stack. Richie’s team needed to move away from a traditional brick-and-mortar approach to a modern data stack living in the cloud. “We need to invest more time into our journey into cloud architecture and building pipelines that are sustainable — instead of wasting time on maintenance and administration.”
With a rapidly expanding internal user-base, Richie and his team face growing demands for data, putting pressure on the 155-year-old organization’s immense and complex infrastructure.
“Everybody wants their data today and right now. And how we make that faster and quicker is what we are constantly looking at solutions for,” Richie shares. “We want to keep the engineers working on more business-based solutions than just maintenance of existing pipelines, which is the business that we don't want to be in. There is no way we will be able to manage that level of a user base if we are not consistently looking at managing data models in sort of managing ETL.”
To demonstrate the point, Richie uses the example of schema drift, a term coined by Gartner to define the unpredictable changes to schema brought on by source system changes and updates. Platforms such as Salesforce are constantly releasing new features, which can introduce schema changes that break brittle ETL pipelines and cost the data engineering team dozens of hours of maintenance time — time better spent generating business impact. With a large number of data users to consider, Richie needed a scalable approach.
“Data drift can happen anytime after this initial acquisition phase,” Kelly adds. “And things can break. They lose integrity after that acquisition process. It speaks to data quality as well. If I've got a lot of data drift my data quality is going to be really challenged.”
Coupled with Snowflake, Fivetran’s automatic schema drift handling and mapping solves data drift concerns for large data engineering teams like Richie’s in a number of ways:
If the source adds a new column, Fivetran detects the change and adds the same column in Snowflake, backfilling the data if applicable.
If a column is removed, Fivetran won’t delete it outright from Snowflake but “soft-delete” it and mark future records NULL so that the old data remains as a historical point of reference.
If a column’s data types change, Fivetran will do its best to losslessly accept both the old and new data by retaining the old column and creating a new column — with a data type accommodating both the old and new data. Users can then create views off that table.
“One of the key features that we love about Fivetran is the automated schema migration and drift handling, where there's not a lot that needs to be done,” says Kelly. “I love how simple Fivetran is. It’s straightforward, and a lot of fun to use.”
Richie’s team uses Fivetran to handle ingestion from common, high-volume SaaS sources, starting with Coupa and experimenting with a few others. This approach allows the team to focus their energy on business impact, building custom integrations that benefit the organization.
“We are using Fivetran for what its strengths are, so we are not spending working hours building each and every connector from scratch, and we don't have to be involved in managing them. Coupa, for example, is a SaaS application. We're now looking at similar SaaS applications that have API-based data integration capabilities that we can access as a service from Fivetran,” says Richie.
Fivetran’s SaaS model allows them to move away from a one-size-fits-all approach to a model that is more agile and scalable, decreasing development cycles and allowing the data team to react quickly to business needs.
Snowflake also plays a critical role in the team’s ability to rapidly scale, as documented by Richie in a 2019 Medium post evaluating the Cloud Warehouse providers.
“We evaluated Snowflake, Redshift and BigQuery. We went with what makes sense for our infrastructure and for our environment,” said Richie. “Elastic data warehouses are about scale. Although it is easy to focus on the volume of data, elastic data warehousing is primarily about adapting to any scale without added complexity or disruption. Snowflake brilliantly separates storage, compute and metadata management, trillions of rows can be sliced up with ease by concurrent users. Storage and compute can be scaled up and down independently and immediately, and the metadata service will automatically scale up and down as necessary”
Richie defines outcomes analytically: in clear percentages gained, and time saved. Fivetran and Snowflake save the team months of development time.
“The time to market – or time to impact – gives you relevancy,” adds Kelly. “And if you can take things from three months down to just a few hours or a few days, that makes you so relevant as an organization. It makes the daily consumers of the data really, really happy.”
Equally, Fivetran allows Richie’s data engineers to focus on what they love doing most: generating value, or impact, for data users.
“Engineers love to see their products being used. When it happens, we feel valued. For any analytics project, about 85% of the work is on data engineering before the data even becomes consumable,” says Richie. “The rest — 10% to 15% — is where the users are actually getting in. That's where the real beauty of it is: the value of everything that they want. So we want to decrease this 85% to as little as we can.”
With Kimball and previous methodologies, businesses focused on building up for every problem – finding the source system, building a database, a data warehouse, and then applying a semantic layer with a BI tool on top. “But today with the advent of cloud, we have products like Fivetran where we don't really have to build that entire stack — you just enable a SaaS-based integration point,” says Richie. “Time to market for building a data pipeline previously was X hours. Now it's x divided by 10.”
This speed-to-market is what Richie believes will provide a huge competitive edge for his team, and will ultimately deliver more value for data users within their enterprise. Creating a self-service data ecosystem is vital, Richie says, for long-term success.
“If every time they have to come to you with, ‘Hey, I'm trying to figure something out,’ it just becomes too much — you cannot serve such a large community when they have to come to you. We have to be ahead of that curve and understand what is that unmet, unarticulated need that the analyst community is looking for.”