08 May 2019 | Article

How to Engineer Industry-Leading Data Connectors

Ciara Rafferty
Ciara Rafferty
How to Engineer Industry-Leading Data Connectors
What it comes down to: how we pull the data from a source, how we prepare the data, and how we load it into your warehouse.

We build the best connectors out there — a big statement, we know. Our efficient, battle-tested API connectors, our data preparation, and our data loading techniques set us apart from other solutions and enable your business to drastically improve reporting and analysis using centralized data.

Additionally, while most ETL tools are built for on-premises warehouses, Fivetran is built to connect to cloud data warehouses. There are many benefits to being hosted in the cloud. If an issue arises, we jump in and fix it immediately — we don’t need to deploy a new version of Fivetran on your server. We also release updates multiple times a week and always remain on top of pipeline issues.

How We Pull the Data From Applications

To get data from an application, you have to call the API. Without a nuanced strategy, you can easily use more API calls than you need to, wasting time and bandwidth as well as your allotment of API calls. Different types of API protocols, such as SOAP and REST, are more reliable at pulling specific types of data and can fail when used improperly. Fivetran has an efficient strategy for pulling the data from each of our supported APIs, allowing us to collect all of your data reliably and quickly. Additionally, since we specialize in these connectors, we stay abreast of API changes. If a company changes its API, we're on it faster than a team dealing with many other unrelated tasks.

Our Salesforce API, for instance, has been battle-tested by hundreds of customers and we have been able to successfully solve for even the most obscure issues that have come up. This is how we create the best possible connector. Our support team has already dealt with the numerous problems that people run into when building their own connectors, saving your engineers the hassle and energy of building brittle connectors.

How We Pull the Data From Databases

Database architectures typically include a master database (the production database) and a replica database (the backup of the production database). A common way to pull data from a database is to simply query the master database over and over again. However, querying your master database applies an intensive load and will likely slow down your entire system. If you’re running a critical application off your database this could be detrimental to your entire business.

There is a much more efficient way to pull the data that involves putting a very minimal load on the database. Each database has a logging system that tells the replica what changes have been made in the master database. For the initial historical sync, Fivetran queries the database directly. But afterwards, for any new or changed data, Fivetran leverages a logging technology called change data capture in order to incrementally detect and replicate data changes. Reading the logs incrementally is an advanced technology that involves decoding the often complicated logging systems. The result is an accurate, lightning-fast connector that puts almost no load on your production system.

How We Prepare the Data

When we bring the data into Fivetran, we transform it into a pre-built, standard schema. Our schemas are normalized, which limits redundancy and improves data integrity, and makes them ready for analysts. If you were to simply replicate the data from an API, you would end up with a very messy schema.

How We Load the Data

We take the data, convert it into CSVs within our servers, automatically create tables, and load the data into your warehouse. Many other tools load raw data into staging tables in the warehouse, wasting the compute power of the warehouse. Others force you to create the tables – either by hand, or via scripts written by engineers. If you have a database with thousands of tables and thousands of columns, it can take a very long time to create these tables.

We also load data from multiple sources into your warehouse at once. In the case of warehouses like Snowflake, which have per-second pricing, loading data sources in parallel reduces costs, as we’re not constantly running it source by source.

A Note on Security

We give you control over what you load into your warehouse. You can block or hash data by individual columns or tables. Data is encrypted in transit and at rest, and we delete it from our system as soon as it is loaded into your warehouse. Learn more about our commitment to security in our documentation.

To learn more about Fivetran connectors, sign up for a demo of our service. If you’re ready to see for yourself why we build the best connectors around, sign up for a free trial.

About Fivetran

Shaped by the real-world needs of data analysts, Fivetran technology is the smartest, fastest way to replicate your applications, databases, events and files into a high-performance cloud warehouse. Fivetran connectors deploy in minutes, require zero maintenance, and automatically adjust to source changes — so your data team can stop worrying about engineering and focus on driving insights.

Are You A Data Expert?

Get started with a free trial today.

Discover the smartest solution for data-driven results.