Fivetran is the industry leader in fully managed data integration from diverse sources to your data warehouse. We have over 150 prebuilt connectors that can deliver data from popular sources like Zendesk, NetSuite and Salesforce to your warehouse within minutes.
However, many organizations struggle to get data from obscure sources. In fact, this is a dilemma for nearly every company that uses data correctly. There often won't be a prebuilt connector for every esoteric source you have, no matter which integration vendor you use.
This also happens to us, the internal data analytics team at Fivetran! We get to use all of the shiny connectors Fivetran builds, but even then, we sometimes need to pull data from an unsupported source. So what do we do?
There is a very easy answer here. We use the Fivetran Google cloud function connector to pull data from any source – yes, any, even if Fivetran does not have a native prebuilt connector for it. How do we do it? Often with less than 100 lines of Python code.
Let me explain how you should think about data integration tools. There are three fundamental pieces: data reader, core processor and data writer:
- Data reader. This is the piece that talks to the source API you are trying to get the data from.
- Core processor. This is the piece that takes the output of the data reader and makes sense of it. It figures out what is an update versus what is an insert, it recognizes what is a new column or a new table, etc.
- Data writer. This last piece takes the output from the core processor and loads it to the final destination – your data warehouse.
For a well-architected integration tool, the core processor and the data writer are agnostic to the data reader. They don’t really care what the source of the data is, as long as it is passed to them in a specific format. This is exactly where the Fivetran function connector comes in. You can write a short piece of code for the data reader piece, and connect it to the existing core processor and data writer that Fivetran uses behind the scenes.
Here is a real-life example. We needed to pull data from the HR tool Namely, and Fivetran does not have a native connector for it. We literally wrote it in less than 100 lines of code. You can find the data reader code for it below, and the example output format that it produces, which is needed for the core processor to just “take it from there."
Code: https://gist.github.com/gareginordyan/5240efcfacb175bb47192d109ef542e7
Output snippet:
That’s all there is to it! You just host the Python function you wrote in your Google Cloud environment (or Lambda in AWS, or Azure Functions in Microsoft Azure) and then point the relevant Fivetran connector to it. By this time, your first coffee of the day is probably empty. Go get another cup and your data will be waiting for you in your warehouse before you have a chance to finish it.
For more detailed instructions about setting up and configuring cloud functions, check out our previous posts on the subject, or look at the functions section in our docs.