For a demo of using a cloud function to sync Justin Bieber's Twitter feed into a data warehouse, check out this webinar.
At Fivetran, our core product is a set of automated connectors that sync all your company's data into your data warehouse with zero configuration and minimal setup. This strategy works well for standard data sources: everything from Salesforce to MySQL to S3 can be synced to your data warehouse in one click. We've spent years building great connectors that capture incremental changes accurately from each source and deliver great schemas to your data warehouse.
But there are some data sources where our strategy of building standardized connectors doesn't work:
Custom APIs
Obscure APIs that have only a few users
Data formats that are not self-describing, like Protobuf
How can Fivetran support custom data sources like these? Our answer is our cloud-functions connector. Here's how it works:
You write a tiny function that fetches data from the custom source
Fivetran calls your function every 15 minutes to fetch new data
Fivetran deduplicates this data and merges it into your warehouse
To see how this works, let's walk through a simple example. We're going to pull data from Justin Bieber's twitter feed; this example is a little silly, but it contains just enough complexity to make a good demo.
Request format
Every Fivetran request follows this format:
{ "secrets": {...}, "state": {...} }
The secrets object allows you to store secrets like database passwords and API keys. In this example, we're going to store our Twitter consumerKey and consumerSecret:
The state is how you implement incremental updates. The first time Fivetran calls your function, state will be {}. Your function will include the new state in its response alongside the new data. The next time we call your function, we'll pass back this new state. In this example, we'll use state to keep track of since_id, an integer representing the ID of the most recent tweet we've synced.
The code for this example is pretty simple; there's a standard wrapper defined by Google Cloud functions:
/** * @param {!Object} req Cloud Function request context. * @param {!Object} res Cloud Function response context. */ exports.handler = (req, res) => { ... }
AWS Lambda and Azure Functions are similar. Our first step is to get a token from api.twitter.com/oauth/authenticate using the secrets from req.body.secrets:
insert contains the rows you want to add to your data warehouse. Fivetran will infer column names and types based on the data you send us, but if you specify a primary key in schema we'll use it to deduplicate the data. Here's how we construct the response for Justin's tweets:
Fivetran will deduplicate this response and ingest it into your warehouse using a merge operation on the primary key, and you'll end up with two tables in your warehouse:
For the complete code of this example, check out our github repository. And if you're looking to bring together your company's data into a data warehouse with minimum fuss, check out Fivetran.
Start for free
Join the thousands of companies using Fivetran to centralize and transform their data.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.