Agent Disk Requirements
This section describes the disk requirements for the machines running HVR Agent on the source and target locations.
Source Location Machine
The HVR Agent machine on the capture location needs resources to perform the following functions:
- For one-time data loads (refresh) and row-wise compare, the HVR Agent retrieves data from a source database, compresses it, optionally encrypts it and sends to the HVR Hub. For optimum efficiency, data is not written to the disk during such operations. Matching source database session(s) may use a fair amount of database (and system) resources. Resource consumption for Refresh and Compare is only intermittent.
- For bulk compare jobs, the HVR Agent computes a checksum for all the data.
- To set up CDC during hvractivate, HVR Agent retrieves metadata from the database and adds table-level supplemental logging as needed.
- During CDC, resources are needed to read the logs, parse them, and store information about in-flight transactions in memory (until a threshold is reached and additional change data is written to disk). The amount of resources required for this task varies from one system to another, depending on numerous factors, including:
- the log read method (direct or through an SQL interface),
- data storage for the logs (on disk or in, for example, Oracle Automatic Storage Manager),
- whether the system is clustered or not,
- the number of tables in the replication and data types for columns in these tables, and
- the transaction mix (ratio of insert versus updates versus deletes, and whether there are many small, short-running transactions versus larger, longer-running transactions).
Log parsing is generally the most CPU-intensive operation that can use up to 100% of a single CPU core when capture is running behind. HVR Agent uses one log parser per database thread, and every database node in an Oracle cluster constitutes one thread.
For a real-world workload with the HVR Agent running on the source database server, it is extremely rare to see more than 10% of total system resource utilization going to the HVR Agent machine during CDC, with typical resource consumption well below 5% of system resources.
For an Oracle source database, HVR Agent will periodically write the memory state to disk to limit the need to re-read archived log files to capture long-running transactions. Consider storage utilization for this if the system often processes large, long-running transactions.
Resource Consumption
CPU
Every channel will use up to 1 CPU core in the system. If HVR Agent runs behind and there is no bottleneck accessing the transaction logs or using memory, then HVR Agent can use up to the full CPU core per channel. In a running system with HVR Agent reading the tail end of the log, the CPU consumption per channel is typically much lower than 100% of the CPU core. Most of the CPU is used to compress transaction files. Compression can be disabled to lower CPU utilization. However, this will increase network utilization (between source HVR Agent machine and the HVR Hub and between the HVR Hub and any target HVR Agent machine). Refresh and Compare operations that are not run on an ongoing basis will add as many processes as the number of tables refreshed/compared in parallel. In general, the HVR Agent process uses relatively few resources, but the associated database job uses a lot of resources to retrieve data (if parallel select operations run against a database, then the Refresh or Compare operations can use up to 100% of the CPU on the source database).
Memory
Memory consumption is up to 64 MB per transaction per channel. Generally, 64 MB per transaction is not reached and much less memory is used but this depends on the size of the transactions and what portion of it is against tables that are part of a channel. Note that the 64 MB threshold can be adjusted (upwards and downwards).
Storage Space
The HVR installation is about 100 MB in size and while running Capture, it uses no additional disk space until the 64 MB memory threshold is exceeded and HVR starts spilling transactions to disk. HVR will write compressed files but in rare cases, with large batch jobs modifying tables in the channel that only commit at the end. HVR may be writing a fair amount of data to disk starting with at least 5 GB for $HVR_CONFIG. Please note that HVR Compare may also spill to disk which would also go into this area.
I/O
Every channel will perform frequent I/O operations to the transaction logs. If HVR Agent is current, then each of these I/O operations is on the tail end of the log, which could be a source of contention in older systems (especially if there are many channels). Modern systems have a file system or storage cache, and frequent I/O operations should barely be noticeable.
Target Location Machine
The HVR Agent machine on the integrate location needs resources to perform the following functions:
- Apply data to the target system, both during a one-time load (refresh) and during continuous integration. The resource utilization for this task varies a lot from one system to another, mainly depending whether changes are applied in so-called burst mode or using continuous integration. The burst mode requires HVR Agent to perform a single net change per row per cycle so that a single batch insert, update or delete results in the correct end state for the row. For example, when, in a single cycle, a row is first inserted followed by two updates then the net operation is an insert with the two updates merged with the initial data from the insert. This so-called coalesce process is both CPU and (even more so) memory intensive, with HVR Agent spilling data to disk if memory thresholds are exceeded.
- Some MPP databases like Teradata and Greenplum use a resource-intensive client utility (TPT and gpfdist respectively) to distribute the data directly to the nodes for maximum load performance. Though resource consumption for these utilities is not directly attributed to the HVR Agent machine, you must consider the extra load when sizing the configuration.
- For data compare, the Integrate HVR Agent machine retrieves the data from a target system to either compute a checksum (bulk compare) or to perform a row-wise comparison. Depending on the technologies involved, HVR Agent may, in order to perform the row-wise comparison, sort the data, which is memory intensive and will likely spill significant amounts of data to disk (up to the total data set size).
- Depending on the replication setup, the Integrate HVR Agent machine may perform extra tasks like decoding SAP cluster and pool tables using the SAP Transform or encrypt data using client-side AWS KMS encryption.
With multiple sources sending data to a target, a lot of data has to be delivered by a single Integrate HVR Agent machine. Load balancers (both physical and software-based like AWS’s Elastic Load Balancer (ELB)) can be used to manage integration performance from many sources into a single target by scaling out the Integrate HVR Agent machines.
Resource Consumption
- CPU: on the target, HVR Agent typically does not use a lot of CPU resources, but the database session it initiates does (also depends on whether any transformations are run as part of the channel definition). A single Integrate process will have a single database process that can easily use the full CPU core. Multiple channels into the same target will each add one process (unless specifically configured to split into more than one Integrate process). Compare/Refresh can use more cores depending on the parallelism in HVR. Associated database processes may use more than one core each depending on parallelism settings at the database level.
- Memory: the memory consumption for HVR Agent on the target is very modest unless large transactions have to be processed. Typically, less than 1 GB per Integrate is used. Row-by-row refresh and compare can use gigabytes of memory but are not run on an ongoing basis.
- Storage space: $HVR_CONFIG on the target may be storing temporary files for row-by-row compare or refresh, and if tables are large, a significant amount of space may be required. Start with 5 GB.
- I/O: the I/O performance for HVR Agent on the target is generally not critical.
Monitoring Integrate Agent Machine Resources
Because replication between heterogeneous source and target heavily depends on the available computing on the Integrate agent machine for data type conversions during Refresh and CDC, coalescing operations in a burst cycle, computing checksum for Compare, in cases of HVR utilizing SAP Xform for declustering and depooling tables, decryption of data received from the HVR Hub, and the like, its imperative to determine scale-out Integrate agent strategies pre-deployment.