Here's how SaaS providers can build bulletproof data replication APIs.
Data from SaaS apps is indispensable to helping businesses create a comprehensive view of their operations. At Fivetran, we have extensive experience building data connectors to SaaS APIs to make data integration as seamless as possible.
Our new “Fivetran Protocol” eBook discusses what SaaS providers can do to make data more accessible for their users. The following is a sampling of what is available in more detail in the whitepaper.
Data can be synced in one of two ways – full syncs, in which the entire data set is copied from a source, and incremental updates, in which only changes (new records, updates, and deletions) are propagated to a destination.
Incremental updates are essential because analytics and data-driven production systems depend on fresh data. This requires short turnaround times between changes to records at the source and updates in a data repository, i.e. a data warehouse.
There are two main approaches to incremental updates that allow users to programmatically determine which records have changed most recently.
A change log records every change to every record in all collections, including inserts, updates, and deletions.
Every row in a change log must include a timestamp, the operation that was performed, a reference to the collection affected, and the changed values.
Another method is the modified timestamp. Rather than maintaining a separate log, records in the original tables have an attribute that indicates when they were last modified as well as a flag for deletion.
The whitepaper provides a more detailed treatment of the tradeoffs associated with each approach.
Incremental updates are performed more regularly than full syncs. However, it is just as important to make full syncs as easy and fast as possible.
There is no way of fully predicting what your users need or how they will interpret their data. A first priority is to ensure they have access to all their data, even if it seems esoteric or rarely used. A complete copy of the data saves users confusion and guesswork.
The second consideration is that full syncs should take a reasonable amount of time. Even at billions of records, they should take no more than a matter of days, rather than weeks or months. To that end:
API endpoints should retrieve multiple records – Don’t build endpoints that can only retrieve one ID at a time.
Don’t throttle too much – Throttling should protect your system without making it too slow for users to practically iterate through records.
Paging – One of the most straightforward ways to improve the odds of a successful full sync is to paginate the results so that individual responses are kept to a reasonable size.
Don’t join or duplicate data – Trying to anticipate analytics use cases can tempt you to denormalize data and join related entities together. Nested or joined records, however, can introduce redundancy and confusion, as well as extend the amount of time required to perform a sync.
One of the most vexing challenges to working with an API endpoint is incomplete or inconsistent documentation. Gaps in documentation mean a combination of painstaking reverse engineering and guesswork. The following actions can avoid these problems:
Consistent naming and rules – Consistency leaves fewer exceptions for users to handle, allowing them to build simpler, easier-to-understand implementations.
List all available query parameters – Listing all available query parameters (both mandatory and optional) allows your users to access and leverage of all of the data their activities create.
Define all keys and attributes – Clearly defining what every key and attribute stands for and means, especially primary and foreign keys, takes guesswork out of understanding your data model.
The topics listed above are not exhaustive of the considerations you should make when building an API endpoint, and there are many additional nuances to consider. To learn more, read the whitepaper.
As a SaaS provider, you owe your users the opportunity to make the most of the data they generate in your product. A high-quality API gives your users the means to integrate their data with a rich, modern, cloud-based data ecosystem. This allows them to use your products more effectively, making their success your success.