High-Volume Agent Connectors
High-Volume Agent (HVA) connectors, available within the SaaS and Hybrid deployment models, provide an efficient, scalable, and secure data replication solution for handling substantial data volumes for your database. This document offers a detailed overview of Fivetran HVA connectors, highlighting their architecture, core capabilities, and how they differ from our other database (non-HVA) connectors and HVR.
Supported databases and services
We have HVA connectors for the following databases and services:
Architecture and operations
HVA connectors provide an effective and scalable solution for real-time data replication, particularly suited for handling larger volumes of data. Using the agent-based approach instead of remote capture, the HVA connectors ensure high-performance data replication that is secure, reliable, and less resource-intensive on the source database systems.
The HVA architecture diagram presents a streamlined process where data flows from the customer's network through the HVA connector to Fivetran network and ultimately to the destination network.
At the core of the architecture is an on-premises agent that typically resides on the same machine as the source database. This setup enables the agent to directly access and identify change data, minimizing latency and maximizing the efficiency of encrypted and compressed data transfer from a source system to your destination.
Agent installation
The HVA is an installable software that is typically configured directly on the machine where a source database resides. The agent is responsible for initiating and managing the data change capture from the source database. You can download the HVA installation file from the Downloads page of your Fivetran dashboard. Once installed, the HVA runs as a system process (daemon) on Unix and Linux and as a service on Windows.
Connection
The agent establishes a secure connection to the Fivetran network using a specified TCP/IP port number (default port is 4343
). The encrypted connection ensures the safe transmission of data. Depending on the deployment model and security requirements, we support various methods for secure connections, including:
- Direct connection
- SSH tunnel
- Virtual Private Network
- Private networking, such as AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect
For more information, see the HVA connection options section.
Incremental updates and compression
The HVA uses log-based Change Data Capture (CDC) to capture data changes asynchronously from the database transaction logs. Before transmitting the identified changes, the agent compresses the data, significantly reducing the size and therefore decreasing network bandwidth usage.
Data transmission
The agent sends the compressed data encrypted over the secure connection to the Fivetran cloud.
Processing and loading
We process the incoming data streams to match the destination schema and then load the data into your destination, making it available for analytics and business intelligence tools.
Orchestration and configuration
We orchestrate the data movement, ensuring efficient transfer, and manage the configuration of the replication process through the Fivetran dashboard.
Key differences between Fivetran database replication solutions
This section features a comparison table detailing the key differences between HVA connectors, database (non-HVA) connectors, and HVR in terms of functionality, architecture, and intended use cases.
HVA Connectors | Database (non-HVA) Connectors | HVR | |
---|---|---|---|
Architecture | On-premises agent-based architecture that combines high-volume replication capabilities with ease of use, offering a managed service through the Fivetran dashboard. | Database (non-HVA) connectors use remote change data capture methods. | A standalone high-volume real-time data replication platform offering flexibility for data integration scenarios. |
Performance | Designed for high-volume use cases, utilizing log-based change data capture directly from source system logs for low-latency replication. | Suitable for lower to medium data volumes, may introduce more latency compared to HVA and HVR. | Intended for high-volume, near real-time data replication scenarios; provides additional features such as compare and more granular control over replication processes. |
Setup and management | Requires an agent installed on the database server but simplifies the replication process with a user-friendly interface (Fivetran dashboard) and managed updates. | Fully-managed service, the entire lifecycle, from configuration to daily operation, is managed through Fivetran dashboard without the need for any on-premises installation. | The setup process is more complex and customizable than HVA and database (non-HVA) connectors, catering to high-demand environments. Its extensive configurability supports complex replication scenarios but comes with higher operational overhead. |
Ideal use case | Enterprises looking to leverage our managed service ease while accommodating the demands of high-volume data replication. | Best for organizations seeking simplicity in setup and management, where direct, high-performance replication is less critical. | Aimed at complex, high-demand environments that require near real-time data replication across heterogeneous systems. |
Data flow | By default, data flows through the Fivetran cloud. Hybrid deployment enables database data to flow directly from source to destination. | By default, data flows through the Fivetran cloud. Hybrid deployment enables database data to flow directly from source to destination. | Data flows directly from source to destination. No database data passes through the Fivetran cloud. |
The choice between HVA connectors, other database (non-HVA) connectors, and HVR depends on the specific data replication needs, volume, and the needed management level. HVA connectors offer a balance between high-performance replication and ease of use, making it suitable for enterprises that require efficient data replication without the complexity of self-hosting the entire replication software.
Key capabilities
HVA connectors are most beneficial for databases that generate large amounts of data or have an extensive historical data repository. Examples include software product databases that track every event or databases used as ERP backends, where there may be thousands of data units to track and monitor.
The distributed architecture with agents provides performance and scalability advantages.
- Efficient data syncing: HVA connectors use log-based Change Data Capture (CDC) technology to efficiently replicate data asynchronously without impacting the performance of source systems. By reading directly from the source system’s transaction logs, HVA connectors support large volumes of data, ensuring data is consistently up-to-date without the need for extensive querying that could slow down the database.
- Scalable architecture: HVA connectors are capable of handling significant data volumes across various database platforms. This flexibility is crucial for businesses that experience data growth, ensuring that their data replication processes can scale without compromising performance.
- Secure data transmission: All data transmitted across networks is encrypted, safeguarding sensitive information during transit. The use of SSL certificates further tightens authentication, ensuring that data transfers are not only secure but also verified.
- Bandwidth efficiency: HVA connectors filter and compress data before transmission, consuming less bandwidth and requiring fewer data packets. This approach reduces bandwidth requirements and associated costs.
- Versatile replication options: HVA connectors support all our supported destinations, including major cloud data warehouses and data lakes.
- High performance: The agent-based approach of HVA connectors is capable of replicating data with high speeds, making it ideal for large data volumes. By reducing network congestion and maximizing data throughput, this strategy ensures that data replication is both fast and efficient.
- Ease of use: Our SaaS deployment model allows organizations to benefit from high-volume data replication capabilities without the complexity that often accompanies such processes. The user-friendly interface minimizes the need for extensive technical knowledge, allowing businesses to focus on insights and actions derived from their data rather than the underlying replication mechanics.
Resource consumption
CPU: HVA uses up to 1 CPU core in the system for every incremental sync. During the initial sync, up to four tables at a time are loading, each running its own process. Log parsing is generally the most resource-intensive operation that can use up to 100% of a single CPU core until CDC reaches the current point in the log.
Memory: Memory consumption is up to 64 MB per open transaction at any point in the transaction log stream. On a typical online transaction processing workload with very many small concurrent transactions, memory usage does not reach 64 MB per transaction. However, memory usage depends on the size of transactions and what portion of them are against tables that are part of the configuration. Transactions that would need more than 64 MB of memory automatically spill to disk.
Storage space: An HVA installation is about 135 MB in size. While running, it does not use any additional disk space until it exceeds the 64 MB memory threshold and starts accumulating transaction fragments on disk. These files are compressed. If a database runs large batch jobs modifying multiple tables and only committing at the end, HVA may write a lot of data to disk. We recommend starting with at least 5 GB for the configuration folder (
HVR_CONFIG
).
HVA connection options
We provide multiple network configuration options to connect your databases and services using HVA connectors. Depending on your networking needs and preferences, consider these options and choose the suitable configuration for your environment.
- Direct connection
- Private networking
- SSH tunnel connection
- Reverse SSH tunnel connection
- VPN tunnel connection
- Proxy Agent connection
Direct connection
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle, 1433 for SQL Server, and 8471 for Db2 for i. |
Agent Host | agent_host | Public IP address or DNS name. |
Agent Port | agent_port | The default port is 4343 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- Fivetran must have access to the HVA host and port.
Private networking
We support the following providers for HVA connectors:
- AWS PrivateLink - See our AWS PrivateLink setup guide for details.
- Azure Private Link - See our Azure PrivateLink setup guide for details.
- Google Cloud Private Service Connect - See our Google Cloud Private Service Connect setup guide for details.
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle and 1433 for SQL Server. |
Agent Host | agent_host | Private IP address or DNS name within your VPC, accessible by Fivetran through Private Link. |
Agent Port | agent_port | The default port is 4343 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- Fivetran must have access to the HVA host and port through the PrivateLink.
SSH tunnel connection
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle, 1433 for SQL Server, and 8471 for Db2 for i. |
Agent Host | agent_host | The private IP address of your internal network. |
Agent Port | agent_port | The default port is 4343 . |
SSH Host | ssh_host | Public IP address or DNS name. |
SSH Port | ssh_port | The default port is 22 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- The SSH server must have access to the HVA host and port.
- Fivetran must have access to the SSH host and port.
Reverse SSH tunnel connection
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle, 1433 for SQL Server, and 8471 for Db2 for i. |
Agent Host | agent_host | Use localhost or 127.0.0.1 . |
Agent Port | agent_port | Reverse SSH forwarding port mapped to the remote port 4343 . |
SSH Host | ssh_host | Private IP address or DNS name within Fivetran VPC. |
SSH Port | ssh_port | The default port is 22 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- The SSH server must have access to the HVA host and port.
- Fivetran must have access to the reverse SSH port mapped to the remote HVA port.
VPN tunnel connection
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle and 1433 for SQL Server. |
Agent Host | agent_host | Private IP in your internal network accessible through VPN. |
Agent Port | agent_port | The default port is 4343 . |
SSH Host | ssh_host | Private IP address or DNS name within Fivetran VPC. |
SSH Port | ssh_port | The default port is 22 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- The Fivetran SSH server must have access to the HVA host and port through VPN.
- Fivetran must have access to the Fivetran SSH host and port.
Proxy Agent connection
Setup form fields | API configuration parameters | Description |
---|---|---|
Host | db_host | Database host or IP address. Use localhost if the HVA is installed on the same host as the database. |
Port | db_port | Database port. The default port is 1521 for Oracle, 1433 for SQL Server, and 8471 for Db2 for i. |
Agent Host | agent_host | Private IP address of your internal network, accessible through the Proxy Agent. Use localhost if the Proxy Agent and HVA are installed on the same host. |
Agent Port | agent_port | The default port is 4343 . |
Accessibility requirements
- The HVA must have access to the database host and port.
- The Proxy Agent must have access to the HVA host and port.