AppFlow vs. other enterprise cloud data movement tools

When are the native tools offered by major cloud providers appropriate for your organization?
May 3, 2020

Amazon Web Services, Microsoft Azure and Google Cloud Platform are practically synonymous with the phrase “enterprise cloud.” So, when businesses beginning a cloud migration or expansion are looking for ways to bring their data with them, it’s no surprise that the topic of ETL inevitably comes up.

Each cloud platform has its own tools and services that fulfill specific use cases in the data replication space. Given the number of tools available, however, it can be a challenge to quickly identify the purpose each tool serves — even within their own cloud services.

For simplicity, let’s evaluate each of these tools within the buckets of:

  • Code-based ETL: This will likely be the most familiar process to traditional ETL developers, but with additional tools that abstract some areas of the process. Having said that, in-depth knowledge of the data source export methods and scripting knowledge is required to modify the code to your specific needs.
  • GUI-based ETL: Having a GUI to orchestrate the steps of ETL makes some of the ingestion process easier by providing a code-free interface. But the high amount of requisite configuration, including schema specification, typically means that onboarding takes some time.
  • Workflow-based ETL: This type of ETL is generally perceived as easier than GUI-based ETL, because there are (generally) fewer options, but knowledge of source changes and framework is still usually required as inputs to the configuration fields in a form.
  • Automated ELT: This approach has a shift in the transformation step, where the portions of transformation, such as JSON normalization and schema determination are handled automatically, but data manipulation for reporting is handled post-load. These tools will typically be code-free, and self-maintaining to handle source changes, such as schema changes or API changes.

Below, we’ll outline each of the cloud platforms and which use cases their native tools are best suited for.

Amazon Web Services (AWS)

AppFlow

Bucket

Workflow-based ETL

What is it?

AppFlow is the newest kid on the block. Its shining feature is the ability to do a bi-directional sync between the source and data destination, with RDS and S3 among those AWS-hosted targets. Its initial launch includes 14 natively supported integrations, which it claims do not require code. From a quick snippet of their product overview video, you can see that AppFlow stays true to its word and does not require coding. The pipelines do however, have to be set up per table per data source, and initial configuration requires an understanding of how you want your data to be mapped to the target.

Limitations

AppFlow isn’t built for ongoing data replication, as there’s no native mechanism or specified configuration field for you to read only new data (a new row, new value or object added). This will require manual management of data pipelines intended for ongoing analytics, as schema management with AWS AppFlow is a necessity.

Ideal use case

This tool’s value lies in its ability to perform a bi-directional sync. If your organization wants to use your data target to analyze, modify and push data back into one of the supported sources, and you are comfortable doing so with a small number of tables, AppFlow could help automate some of the hassle.

Glue

Bucket

Code-based ETL

What is it?

AWS Glue is one of a few tools in the AWS Service toolkit touted as an ETL tool. Glue allows you to scan your AWS-hosted data repository to quickly create a metadata catalogue, generate customizable ETL code, and schedule the process.

Limitations

In addition to having to manually script a method for external applications and databases to replicate data into AWS, this tool’s approach will require ongoing maintenance to automate data replication, and it will require additional work every time a data source changes.

Ideal use case

Glue is fast finding a home among many modern data organizations solely for its ability to create data catalogues. It may also help your team if your goal is to move data solely within the AWS ecosystem, and you have technical resources available to modify the extract and transform pieces of the code as necessary.

Data Migration Service

Bucket

Workflow-based ETL

What is it?

AWS Data Migration Service, or “DMS,” as it’s known colloquially, references its purpose in its name. DMS was originally created as a one-time database migration service from external databases into the AWS environment. DMS has a text-based interface that automatically infers what the schema should be from the source, and then requires user intervention for final schema determination and ongoing replication.

Limitations

DMS doesn’t require any coding, but does require a high degree of initial configuration. Additionally, AWS DMS reportedly does not handle replication very smoothly, which requires going back into the tool to do full database replication to reset to a working standard.

Ideal use case

Due to the ongoing maintenance needed for consistent replication, this tool is recommended if your organization is looking for a one-time migration of your legacy database and you have limited technical resources available.

Google Cloud Platform

Data Fusion

Bucket

GUI-based ETL

What is it?

Google Data Fusion is a GUI-based application that allows users to pull data primarily from Google data sources and place them into Google targets. Having a wizard-based approach means that this is also a code-free experience, but with higher initial configuration for schema determination and ongoing maintenance to accommodate data source changes. A compute instance, which you’ll need to size to meet your workload, is also required to run Data Fusion.

Limitations

At this time, Google has a limited number of supported sources, which do not cover the majority of web applications. The sizing exercise for hosting a compute instance is also challenging for users that are either looking for the most cost efficient way to run ETL or are new to Google Cloud Platform. You'll also need to manually reconfigure your pipeline to adjust for any schema changes in the source.

Ideal use case

If you’re already in the Google ecosystem, and looking to automate a data pipeline from a few sources to your Google data store without coding, this is a good tool to consider.

Dataflow

Bucket

Code-based ETL

What is it?

Google Dataflow helps you manage the infrastructure your data pipelines run on, meaning it automatically scales your compute infrastructure to meet the processing needs of your data pipeline, while allowing you to use Apache Beam for data processing.

Limitations

This tool’s approach, by nature of being tied to code, offers a high degree of customization but requires a technical resource to set up the initial scripts and be available for ongoing maintenance when data sources are inevitably added or changed.

Ideal use case

If your organization has complex edge cases that require a manually coded data pipeline, Dataflow could make it easier to determine where and how to host infrastructure.

Microsoft Azure

Data Factory

Bucket

GUI-based ETL

What is it?

Azure Data Factory is another GUI-based tool, similar to Google’s Data Fusion, and it has similar limitations. It uses a drag-and-drop interface to create data pipelines, as opposed to coding, but requires knowledge of the data source and an idea of how to map it to the data destination.

Limitations

Data Factory does not offer automatic schema migration, meaning that manual intervention will be needed for new objects in each source, as well as any structural changes that happen on the source side.

Ideal use case

Data Factory might work for your team if your organization is already in the Azure ecosystem, doesn’t mind managing pipelines to accommodate source changes, and has a small set of data sources that appear on this list.

What is Fivetran?

Fivetran excels at ongoing, maintenance-free data replication. Our approach automatically handles both schema determination during initial connector set up as well as schema maintenance for ongoing source changes. It works extremely well with post-load transformations by enabling you to load all of your available data from each data source on a configurable schedule.

Bucket

Automated ELT

Limitations

If you’re looking for a one time migration of data, you won’t be able to fully utilize Fivetran.

Ideal use case

  • Your organization wants to shorten time to value for analysts looking to generate actionable insights by using a code-free, maintenance-free ELT tool.
  • Your organization does not want its data replication tied to any one cloud platform or, by extension, data destination.
  • Your organization’s size warrants a managed, easy-to-use solution backed by 24-7 support.
  • Your organization requires support for one or several of an ever-growing list of 150+ native connectors.

Summary chart

Code-based ETL GUI-based ETL Workflow-based ETL Automated ELT
AWS AppFlow X
AWS Glue X
AWS DMS X
Google Data Fusion X
Google Dataflow X
Azure Data Factory X
Fivetran X

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

AppFlow vs. other enterprise cloud data movement tools

AppFlow vs. other enterprise cloud data movement tools

May 3, 2020
May 3, 2020
AppFlow vs. other enterprise cloud data movement tools
Topics
No items found.
Share
When are the native tools offered by major cloud providers appropriate for your organization?

Amazon Web Services, Microsoft Azure and Google Cloud Platform are practically synonymous with the phrase “enterprise cloud.” So, when businesses beginning a cloud migration or expansion are looking for ways to bring their data with them, it’s no surprise that the topic of ETL inevitably comes up.

Each cloud platform has its own tools and services that fulfill specific use cases in the data replication space. Given the number of tools available, however, it can be a challenge to quickly identify the purpose each tool serves — even within their own cloud services.

For simplicity, let’s evaluate each of these tools within the buckets of:

  • Code-based ETL: This will likely be the most familiar process to traditional ETL developers, but with additional tools that abstract some areas of the process. Having said that, in-depth knowledge of the data source export methods and scripting knowledge is required to modify the code to your specific needs.
  • GUI-based ETL: Having a GUI to orchestrate the steps of ETL makes some of the ingestion process easier by providing a code-free interface. But the high amount of requisite configuration, including schema specification, typically means that onboarding takes some time.
  • Workflow-based ETL: This type of ETL is generally perceived as easier than GUI-based ETL, because there are (generally) fewer options, but knowledge of source changes and framework is still usually required as inputs to the configuration fields in a form.
  • Automated ELT: This approach has a shift in the transformation step, where the portions of transformation, such as JSON normalization and schema determination are handled automatically, but data manipulation for reporting is handled post-load. These tools will typically be code-free, and self-maintaining to handle source changes, such as schema changes or API changes.

Below, we’ll outline each of the cloud platforms and which use cases their native tools are best suited for.

Amazon Web Services (AWS)

AppFlow

Bucket

Workflow-based ETL

What is it?

AppFlow is the newest kid on the block. Its shining feature is the ability to do a bi-directional sync between the source and data destination, with RDS and S3 among those AWS-hosted targets. Its initial launch includes 14 natively supported integrations, which it claims do not require code. From a quick snippet of their product overview video, you can see that AppFlow stays true to its word and does not require coding. The pipelines do however, have to be set up per table per data source, and initial configuration requires an understanding of how you want your data to be mapped to the target.

Limitations

AppFlow isn’t built for ongoing data replication, as there’s no native mechanism or specified configuration field for you to read only new data (a new row, new value or object added). This will require manual management of data pipelines intended for ongoing analytics, as schema management with AWS AppFlow is a necessity.

Ideal use case

This tool’s value lies in its ability to perform a bi-directional sync. If your organization wants to use your data target to analyze, modify and push data back into one of the supported sources, and you are comfortable doing so with a small number of tables, AppFlow could help automate some of the hassle.

Glue

Bucket

Code-based ETL

What is it?

AWS Glue is one of a few tools in the AWS Service toolkit touted as an ETL tool. Glue allows you to scan your AWS-hosted data repository to quickly create a metadata catalogue, generate customizable ETL code, and schedule the process.

Limitations

In addition to having to manually script a method for external applications and databases to replicate data into AWS, this tool’s approach will require ongoing maintenance to automate data replication, and it will require additional work every time a data source changes.

Ideal use case

Glue is fast finding a home among many modern data organizations solely for its ability to create data catalogues. It may also help your team if your goal is to move data solely within the AWS ecosystem, and you have technical resources available to modify the extract and transform pieces of the code as necessary.

Data Migration Service

Bucket

Workflow-based ETL

What is it?

AWS Data Migration Service, or “DMS,” as it’s known colloquially, references its purpose in its name. DMS was originally created as a one-time database migration service from external databases into the AWS environment. DMS has a text-based interface that automatically infers what the schema should be from the source, and then requires user intervention for final schema determination and ongoing replication.

Limitations

DMS doesn’t require any coding, but does require a high degree of initial configuration. Additionally, AWS DMS reportedly does not handle replication very smoothly, which requires going back into the tool to do full database replication to reset to a working standard.

Ideal use case

Due to the ongoing maintenance needed for consistent replication, this tool is recommended if your organization is looking for a one-time migration of your legacy database and you have limited technical resources available.

Google Cloud Platform

Data Fusion

Bucket

GUI-based ETL

What is it?

Google Data Fusion is a GUI-based application that allows users to pull data primarily from Google data sources and place them into Google targets. Having a wizard-based approach means that this is also a code-free experience, but with higher initial configuration for schema determination and ongoing maintenance to accommodate data source changes. A compute instance, which you’ll need to size to meet your workload, is also required to run Data Fusion.

Limitations

At this time, Google has a limited number of supported sources, which do not cover the majority of web applications. The sizing exercise for hosting a compute instance is also challenging for users that are either looking for the most cost efficient way to run ETL or are new to Google Cloud Platform. You'll also need to manually reconfigure your pipeline to adjust for any schema changes in the source.

Ideal use case

If you’re already in the Google ecosystem, and looking to automate a data pipeline from a few sources to your Google data store without coding, this is a good tool to consider.

Dataflow

Bucket

Code-based ETL

What is it?

Google Dataflow helps you manage the infrastructure your data pipelines run on, meaning it automatically scales your compute infrastructure to meet the processing needs of your data pipeline, while allowing you to use Apache Beam for data processing.

Limitations

This tool’s approach, by nature of being tied to code, offers a high degree of customization but requires a technical resource to set up the initial scripts and be available for ongoing maintenance when data sources are inevitably added or changed.

Ideal use case

If your organization has complex edge cases that require a manually coded data pipeline, Dataflow could make it easier to determine where and how to host infrastructure.

Microsoft Azure

Data Factory

Bucket

GUI-based ETL

What is it?

Azure Data Factory is another GUI-based tool, similar to Google’s Data Fusion, and it has similar limitations. It uses a drag-and-drop interface to create data pipelines, as opposed to coding, but requires knowledge of the data source and an idea of how to map it to the data destination.

Limitations

Data Factory does not offer automatic schema migration, meaning that manual intervention will be needed for new objects in each source, as well as any structural changes that happen on the source side.

Ideal use case

Data Factory might work for your team if your organization is already in the Azure ecosystem, doesn’t mind managing pipelines to accommodate source changes, and has a small set of data sources that appear on this list.

What is Fivetran?

Fivetran excels at ongoing, maintenance-free data replication. Our approach automatically handles both schema determination during initial connector set up as well as schema maintenance for ongoing source changes. It works extremely well with post-load transformations by enabling you to load all of your available data from each data source on a configurable schedule.

Bucket

Automated ELT

Limitations

If you’re looking for a one time migration of data, you won’t be able to fully utilize Fivetran.

Ideal use case

  • Your organization wants to shorten time to value for analysts looking to generate actionable insights by using a code-free, maintenance-free ELT tool.
  • Your organization does not want its data replication tied to any one cloud platform or, by extension, data destination.
  • Your organization’s size warrants a managed, easy-to-use solution backed by 24-7 support.
  • Your organization requires support for one or several of an ever-growing list of 150+ native connectors.

Summary chart

Code-based ETL GUI-based ETL Workflow-based ETL Automated ELT
AWS AppFlow X
AWS Glue X
AWS DMS X
Google Data Fusion X
Google Dataflow X
Azure Data Factory X
Fivetran X
Topics
No items found.
Share

Related blog posts

No items found.
No items found.
No items found.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.