Architecturelink
Fivetran offers a range of deployment solutions to facilitate efficient data integration for various business environments. Each solution serves different operational and security needs, allowing organizations and customers to select the most suitable option based on their specific requirements. These solutions include three main deployment models:
SaaS Deployment: In this model, data processing is handled entirely within Fivetran's cloud, which can be selected based on the customer's preferred cloud service provider and pricing plan. The SaaS Deployment model is ideal for businesses looking for a hands-off approach to data processing, allowing them to leverage Fivetran's cloud and expertise with minimal internal resource investment. The model supports a variety of connectors (such as applications, databases (including High-Volume Agent (HVA) connectors), files, events, functions, and logs) and destinations.
Hybrid Deployment: The Hybrid Deployment model provides a balance for organizations that need to keep their data local for security or compliance reasons but still want to benefit from Fivetran's managed service. The model allows organizations to process data locally within their own network while leveraging Fivetran's cloud for orchestrating and configuring all the data movement. The Hybrid Deployment model is supported by multiple Fivetran database connectors and destinations.
Self-hosted Deployment: The Self-hosted model uses the Fivetran HVR solution. For organizations that prefer or require full control over their data integration tools and infrastructure, the Self-hosted model offers the ability to run Fivetran technology on their own servers. The Self-hosted model provides organizations with a complete control over orchestration, configuration, credential management, and code deployment.
SaaS Deploymentlink
Fivetran's SaaS Deployment model offers a fully managed, cloud-based data integration solution. It is designed to connect to various data sources and seamlessly load data into designated destinations, ensuring efficient data management and analytics.
Key featureslink
Multi-source connectivity: Fivetran connects to all of your supported data sources and loads the data from them into your destination. Each data source has one or more connectors that run as independent processes that persist for the duration of one update. A single Fivetran account, made up of multiple connectors, loads data from multiple data sources into one or more destinations.
Types of connectors: Fivetran connects to your data sources using our connectors. Fundamentally, there are two different types of connectors: push and pull.
Pull connectors: Fivetran's pull connectors actively retrieve, or pull, data from a source. Fivetran connects to and downloads data from a source system at a fixed frequency. We use an SSL-encrypted connection to the source system to retrieve data using a variety of methods: database connections via ODBC/JDBC, or web service APIs via REST and SOAP. In practice, the method or combination of methods is different for every source system.
Push connectors: In push connectors, such as Webhooks or Snowplow, source systems send data to Fivetran as events. Once we receive the events in our collection service, they are initially buffered in a queue. We then store the event data as JSON in our cloud storage buckets. During the sync, we push the data to your destination. For more information on how sync works in our push connectors, see our Events documentation.
Data ingestion and preparation: Once the connector process ingests the data query results, Fivetran normalizes, cleans, sorts, and de-duplicates the data. The aim of this process is to optimally format the data for the destination. Fivetran uses a queue to buffer incoming source data, ensuring that in cases of destination load failures due to transient errors or destination unavailability, the data retrieval from the source is not duplicated. This limits the impact of destination outages and improves Fivetran reliability; unprocessed data found in the storage queue is prioritized, and all buffered data is securely encrypted and retained until it is successfully loaded into the destination.
Parallel processing: The ingestion processes run in parallel with the preparation and load processes. This strategy ensures that the destination data load process doesn't block the source data ingestion process.
Temporary data storage: Fivetran outputs the finalized records to a file in a file storage bucket. We encrypt this file with a separate ephemeral key that is known only to the data-writing process. We automatically delete this temporary file after 7 days using an expiration policy on the bucket. We automatically choose the bucket based on the following factors:
For most destinations:
- your Fivetran plan
- selected data processing region
- selected cloud service provider (CSP)
- selected CSP region
For Redshift, your selected cluster region
See our Data Residency documentation for details.
Data load into destination: From the temporary data storage, Fivetran copies the file into staging tables in the destination. In the process, we transmit the ephemeral encryption key for the file to the destination so it can decrypt the data as it arrives. Before we write the data into the destination, we update the schema of existing tables to accommodate the newer incoming batch of data. We then merge the data from the staging tables with the existing data present in the destination. Finally, we apply the deletes (if any) on the existing tables. Once we complete the write process, the connector process terminates itself. A system scheduler will later restart the process for the next update.
Hybrid Deployment Private Previewlink
The Hybrid Deployment model enables organizations to sync data sources using Fivetran while ensuring the data never leaves the secure perimeter of the cloud or on-premises network. This architecture grants you complete control over your data's flow, allowing you to meet specific business needs concerning data security.
With the Hybrid model, you decide where to host the data pipelines, in our secure cloud or within your local environment, while still enjoying the advantages of an automated SaaS model. Your data remains within your private network, with Fivetran serving as a unified control plane for all your data movements. This setup not only supports Hybrid and multi-cloud deployments but also offers an extensible solution complete with APIs, metadata sharing, and more. Additionally, it simplifies troubleshooting, provides straightforward setup, and is easy to configure and support.
When setting up a new data pipeline, you have the option to run it locally. By installing a local processing agent within your environment that communicates outbound with Fivetran, you maintain full control over your data. This agent manages the data pipeline processing in your network, with configuration and monitoring still performed through the Fivetran dashboard or API. Only metadata (including MAR information) and logs are sent to Fivetran, which allows Fivetran to understand how the pipeline is running and to display the details in the dashboard.
Key featureslink
The key features of the SaaS Deployment model:
- Data ingestion and preparation
- Parallel processing
- Temporary data storage: you own and manage the temporary data storage (buckets)
- Data load into a destination
Data privacy and security: Our Hybrid model processes data within your infrastructure, keeping your actual data within the secure boundaries of your network.
Local processing agent: You host this agent on your infrastructure. It connects your local environment to Fivetran's Managed SaaS, maintaining constant communication with Fivetran to determine when the data pipeline needs to run. The local agent picks up those details from Fivetran's orchestration layer to perform the sync.
Deployment and operation: To use the Hybrid model, download the agent configuration files to a Linux machine equipped with Docker. After setting up the local processing agent on your network, the agent manages data pipeline processing within that network. Configuration and monitoring of the data pipeline still occur within the Fivetran environment (via the Fivetran dashboard or API). The agent only sends metadata, including syncs metrics, MAR information and logs to Fivetran's cloud for tracking and monitoring purposes, accessible via the Fivetran dashboard.
Network Security: The local processing agent creates a secure outbound connection to Fivetran using modern encryption standards like mTLS. You have the option to limit the outbound traffic to the Fivetran endpoint.
Capacity and limitations: Each local processing agent can support one destination and up to 10 connectors. It's crucial to plan your deployment strategy with these limitations in mind.
For more information, see our Hybrid Deployment documentation.
Self-hosted Deploymentlink
Fivetran's Self-hosted Deployment model, HVR, supports database and file management system replication in enterprise environments. Hosted locally, it allows organizations to process data within their own infrastructure. This gives organizations complete control over orchestration, configuration, credential management, and code deployment.
HVR is suitable for enterprises that require effective, secure, and adaptable data replication solutions. The HVR solution works with a variety of operating systems and supports a distributed architecture, ensuring minimal impact on systems while providing low latency and handling high data volumes efficiently.
Key featureslink
Distributed architecture: HVR supports a distributed architecture, ideal for complex, large-scale data environments. It facilitates database and file replication across various Database Management Systems. The HVR Agent, acting as a child process for the hub system, enables secure connections to remote locations for capturing or integrating changes. HVR also offers an agent-less architecture for direct DBMS protocol connections.
Topologies: HVR supports multiple replication topologies, including unidirectional, bidirectional, multidirectional, one-to-many, many-to-one, and cascading. This versatility provides flexibility and scalability in your data replication strategies, allowing you to select a topology that meets your specific data distribution, synchronization, and consolidation needs.
HVR Hub System: This is the central component of HVR, orchestrating replication through logical entities called channels. Key elements of the HVR Hub System include:
- HVR Hub Server: Manages the scheduler and serves as the access point for remote connections. It can serve multiple logical hubs.
- Repository database: Stores metadata definitions for replication processes.
- Hub: A logical entity within the HVR Hub System that manages specific replication tasks.
- Scheduler: Handles replication jobs like Capture, Integrate, Refresh, and Compare.
Interfaces: HVR offers multiple interfaces, including a Web User Interface, Command Line Interface, and REST API, providing flexibility in how you can interact with the system.
Compliance and data sovereignty: With its on-premises deployment, HVR meets the needs of organizations with strict compliance and data sovereignty requirements. Its self-hosted nature ensures your data remains under your control, adhering to specific regulatory and security standards.
For more information, see our HVR Architecture documentation.