Scaling and Monitoring HVR Hub and Agent Resources on AWS
This section provides information on scaling resources available to HVR Hub and/or integrate agent on AWS and monitoring HVR Hub disk space and integrate agent resources utilization on AWS.
Scaling Resources Available to HVR Hub and/or Integrate Agent on AWS
The HVR hub requires storage for $HVR_CONFIG, a repository database and an HVR installation. With proper configuration of the repository database in terms of frequent purging of the hvr_stats tables and maintaining the hvr_users tables, not much variability can be expected in the HVR hub storage needs. In the case of an EC2 instance on which an HVR Agent (integrate) is running, storage utilization can be impacted by the size of the burst tables and the number of operations per burst cycle.
Like most stateless services that run on Amazon EC2 instances, scaling can be achieved using Amazon Elastic Load Balancer. This allows the HVR hub to automatically connect to a different stateless agent should the agent or server on which it runs become unavailable. In case the HVR hub or HVR agent installed on an Amazon EC2 instance shows high disk usage either through a third-party monitoring solution or using Amazon CloudWatch (see below), uninterrupted replication can be achieved through elastic scaling of EBS volumes as described here. After increasing the volume, you need to extend the volume's file system to make use of the new storage capacity. For more information, see Extending a Linux File System After Resizing a Volume or Extending a Windows File System After Resizing a Volume.
If your EBS volumes were attached to the EC2 instance, on which the HVR hub is installed, before November 3, 2016, 23:40 UTC, please note that there is no real way to achieve uninterrupted on-demand scaling. See AWS documentation corresponding to this scenario.
In HVR, the data replication location is identified by a DNS entry/IP address. When this location points to an ELB, then multiple EC2 edge nodes, or edge nodes of variable sizes, can be allocated without any change to the definitions in HVR. This provides the ability to dynamically adjust resources available on the edge nodes.
Note that the scaling is not currently automated out-of-the-box.
Because the HVR integrate agents are stateless and one agent can handle multiple connections to one or more target locations, load balancers can be used to help scale parallel processing for bulk loads and continuous data streaming. For example, if you are planning to onboard new source systems feeding a data lake in AWS, you can register new target instances to your Amazon Elastic Load Balancer. In addition, Amazon Auto Scaling Groups could be added to use new EC2 instances running an HVR agent based on CloudWatch Agent alarms detecting CPU or memory at 90% capacity.
Monitoring HVR Hub Disk Space Utilization on AWS
When the HVR hub is deployed in AWS on an EC2 instance, Amazon Elastic Block Store (Amazon EBS) sends data points to CloudWatch for several metrics. Amazon EBS General Purpose SSD (gp2), Throughput Optimized HDD (st1) , Cold HDD (sc1), and Magnetic (standard) volumes automatically send five-minute metrics to CloudWatch. Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch. Therefore, using monitoring services like CloudWatch (Viewing Information about an Amazon EBS Volume) and integrating with the Amazon SNS messaging allows the users to work under optimal storage conditions.
Monitoring Integrate Agent Resources on AWS
Integration with services like Amazon CloudWatch for the EC2 instance on which the HVR agent is deployed can provide guidance to DBAs. Services like Amazon CloudWatch can monitor both on-premise and cloud-based servers in integrated monitoring modules. Refer to the list of available metrics to monitor on servers.
Since the resource usage is highly variable for the integrate agent, monitoring and analyzing patterns in CPU/memory utilization should help determine if a larger EC2 instance is required or if ELB are better choices to keep up the real-time replication needs of the AWS customers.