In this excerpt from The Essential Guide to Data Integration, we discuss the business considerations for choosing a data integration tool.
Data integration getting expensive? This guide isn’t! Sign up for the free, definitive guide to data integration.download now
The following blog post is an excerpt from the book, The Essential Guide to Data Integration: How to Thrive in an Age of Infinite Data. The rest of the book is available to you for free here.
Note: We fully recognize that there are differences between ETL vs ELT and are of the opinion that ELT is a key component that powers Modern Data Stacks, but also recognize that these types of tools are usually evaluated side by side. The notes below will apply to both ETL or ELT focused tools. If you’re looking to learn what ETL tools are made for, see our blog post for an overview!
Continued from Why You Shouldn't Build Your Own Data Pipeline.
Choosing the right data integration solution depends on your organization’s size and maturity, as well as the particular characteristics of the data pipeline provider.
At very small scales, your organization might not need a data pipeline, especially if you are an early-stage startup that only uses one or two data sources, or if you are only conducting qualitative research to find product-market fit. Conversely, your organization might have a niche use case with extremely stringent performance, security or regulatory requirements.
Otherwise, your organization probably struggles with the high engineering costs of building and maintaining data connectors, or endures long report turnaround times from connector maintenance and manual reporting. If so, consider the following.
Familiarize yourself with the pricing structures of prospective tools. Here are some common pricing models:
A flat subscription fee, which might have higher fixed costs in exchange for cost predictability.
Pricing by volume of data, as counted in gigabytes or updates and inserts of rows. A volume-based pricing model can be highly advantageous if you currently handle a very modest scale of data but want to test out a new tool over an extended period, or if you plan to gradually move your workflow to the new system.
Per-seat pricing vs. a single fee for your organization. Per-seat pricing models will typically cost less at small headcounts, but are more of an administrative hassle. Single fees for an entire organization can be simpler and cost less at larger scales.
You might also encounter combinations of pricing models. A service might have a flat “platform” fee and then additional fees for each unique data connector. Volume-based rates might vary by connector. Providers might offer a freemium model up to a certain data volume, or with a restricted feature set.
Most tools come with a free 14-day trial, including Fivetran! We recommend running comprehensive evaluations across your curated list of best ETL tools. You can sign up for a free trial at fivetran.com/signup.
Other factors to consider include the trade-off between ease of use and configurability, and compatibility with your team’s existing skill set.
Non-technical users might or might not be familiar with SQL, but they can almost certainly navigate a BI tool. Analysts typically know SQL, statistics and perhaps a scripting language like Python. Data scientists might have a deeper technical skill set that includes more advanced statistical training and additional languages like Java, as well as “big data” technologies such as Hadoop or Spark. Engineers will likely be familiar with a range of high- and low-level computer languages, as well as an assortment of technology platforms.
Different data integration tools feature different levels of complexity and accessibility. Some rely heavily on custom scripting and offer only the basic scaffolding on which you build your own data pipeline. Others offer drag-and-drop GUIs that allow relatively non-technical users to orchestrate data replication and transformation, but these have two clear drawbacks: a steep, highly platform-specific learning curve and the automatic generation of spaghetti code. Still others combine completely automated data replication with version-controlled, SQL-based transformations.
The trade-off will boil down to accessibility versus configurability. If your goal is to promote data literacy across your organization, then you should find a tool with the lowest barrier to use and the broadest applicability to different use cases. For more specialized use cases, less accessible but more powerful tools optimized for specific niches might be appropriate.
Before you commit to a contract, consider whether the tool will serve your needs in the future. Ask the following questions:
Does the tool feature the connectors you currently use or anticipate using?
Can you easily enable additional features or add data connectors to your account as needed?
Are connectors consistently updated to keep abreast of upstream API changes, and are these updates accompanied by change logs specifying changes?
Are new connectors regularly added to the tool?
Is the support team responsive and capable of keeping up with product changes and your changing needs?
Can you export data models and transformations from one platform to another, or will you have to reverse engineer and rebuild them if you ever switch to a new tool?
Future-proofing is important because switching platforms can be very costly and disruptive.
Read about the technical considerations for choosing a data integration tool next.
The excerpt above is from The Essential Guide to Data Integration: How to Thrive in an Age of Infinite Data. The book covers topics such as how data integration fuels analytics, the evolution from ETL to ELT to automated data integration, the benefits of automated data integration, and tips on how to evaluate data integration providers. Get your free copy of the guide today: