Managed data integration is the best way for enterprises to get maximum value from their data — and their data engineering teams.
Business today is built on data. Enterprises use information from first- and third-party data sources in a variety of ways — to reach new audiences, improve the customer experience, inform real-time decision-making, create operational efficiencies, and take advantage of market opportunities.
The problem for large organizations is that data tends to be decentralized and hard to use. It is often siloed both internally and externally — scattered across third-party channels, databases, SaaS applications and other sources. Organizations need a way to centralize, compare and analyze siloed data in order to fully leverage its value.
The do-it-yourself (DIY) approach to integrating data is to let your data engineers build custom connections to your organization’s data sources. At first glance, this seems to offer the most flexibility and control. The thinking is that if you build it yourself, you’ll be able to customize to your needs, address problems as they arise, and keep your systems secure.
The DIY approach, however, will sap critical development resources that could be allocated to core business functions. It can take a data engineer weeks or even months to build a single connector. Multiply that by the number of connectors needed — often dozens or even hundreds within a single company, depending on the data environment — and you’re talking about a multi-year project. For a more comprehensive look at the time and costs involved, see our blog post on building data pipelines.
There’s also the issue of post-build maintenance: what happens after you’ve created a custom data pipeline. Your new connector can pull your data from a given source, but now you have to continually maintain the code and system infrastructure. Every time your source issues an update to its API or data structures, you’ll need to refresh the API calls and possibly update your connection’s code.
A recent Fivetran survey found that 36% of companies have more than 20 pipelines, and almost a quarter have more than 50. For enterprises, the number can be in the hundreds. Maintaining these pipelines imposes a significant additional burden on data engineering teams. Companies that build their own pipelines risk ever-growing technical debt, data quality problems, and reduced user trust in available data.
DIY data integration services make it easier to create customized data pipelines by providing the pipeline infrastructure. They do not, however, provide the connectivity between sources and your data warehouse.
Open-source DIY solutions, such as Kafka and Spark, may seem like a shortcut, but they require you to build an application layer around their software — a layer that you have to continually maintain and update.
These “all in one” solutions are builds masquerading as buys — the equivalent of a plumber coming to your house, dropping off 20 feet of pipe, and expecting you to solder it in place at the intake valve and under your sink. And then you would have to run around checking for leaks, patching holes, and making sure your basement didn’t flood every time the water company changed the water pressure or chlorination level.
The bottom line is that you shouldn’t have to divert valuable engineering resources to create a capability that you can easily outsource.
It’s much more efficient and timely to rely on managed data integration services. Full-service solutions are the simplest way to connect disparate data sources quickly and efficiently without diverting engineering resources, adding operational complexity to the IT stack, or introducing new data security and privacy risks.
With a fully managed service, stakeholders within the organization can put in a request to pull data not only from SaaS-based sources such as Google Analytics 360, Marketo and Salesforce but also transactional databases such as MySQL, Postgres and Oracle. Managed services typically support hundreds of other data sources, making it easy to expand access to new and other commonly used internal data sources.
In most cases, data teams can make the connection within minutes and initiate the flow of data immediately. Within hours, stakeholders can start manipulating and analyzing data. And data engineers will be able to focus on high-value projects instead of maintaining data pipelines.
In addition, managed services need to protect customer data and meet industry security standards such as SOC Type 2, ISO 270001 and GDPR. They will likely be subject to privacy compliance measures like HIPAA and PCI as well. Meeting these standards is an important criterion for enterprise security and risk management teams, which need to know that best-practice security measures are being applied, alleviating additional burdens on internal data, IT and security teams.
There is one thing to be aware of with a managed solution: No matter how extensive its connector library, it may not include every data source your organization needs. In those cases, companies can build a custom connector — maintaining just one pipeline as opposed to hundreds.
In short, most enterprises can rely on a managed data integration service for 80 or 90 percent of their needs, and then go the DIY route when necessary. This hybrid approach will allow them to take advantage of the efficiency and speed of a managed solution by automating what they can, without limiting future data integration options in any way.
For a real-world example of a company choosing managed data integration to avoid the costs and complexities of DIY, look at New Relic, a provider of software development solutions with 2,000 employees and more than 17,000 global customers. New Relic needed to gain insight into its product-led growth initiatives, including revenue, conversion rates and migration from a legacy pricing model to a new model.
The company’s previous DIY attempt to manually pull data from the company’s SaaS platform and third-party financial services in the cloud fell flat. It required accountants, who were not trained in engineering or data science, to pull data into Excel spreadsheets. An internal audit discovered that it would take three to five data engineers to write and maintain the connectors — at a cost of $500,000 — for a team whose sole purpose would be to write ETL/ELT code.
New Relic instead used Fivetran to automatically pull the data from its financial and product platforms into a central data warehouse, where it serves as a single source of truth for the company’s accountants. This allowed New Relic to run its first automated marketing campaign built around product-qualified leads (PQLs). Data from the product is now joined with billing information to identify customers likely to benefit from the new pricing model, which has delivered measurable business value.
Data integration is a business imperative in today’s data-centric world, but a DIY approach to pulling information from disparate sources into a central data warehouse may be cost-prohibitive. In limited cases, when there are unusual data sources or very particular requirements, a partial DIY approach is justified.
However, in order to take full advantage of the power of data, enterprises are far better off outsourcing their data integration needs to a managed solution. This will allow them to easily centralize existing data, rapidly add new data sources as necessary, and refocus engineering resources on producing the meaningful insights that growth depends on.