Quick Start for HVR - Redshift
This quick start guide helps you to get started with HVR for replicating data into Amazon Redshift database.
To proceed with this replication you must have basic understanding about HVR's architecture and terminologies like Hub, Location, Channel, Location Groups, Actions etc.
The example here demonstrates how to replicate tables from one Oracle schema (source location) to a Redshift MPP database (target location) residing in the Amazon cloud (AWS). To connect efficiently to AWS, an HVR agent installation on an EC2 Linux VM is used. To use HVR Bulk Refresh and Integrate with Burst, an additional AWS S3 location is created as well, which functions as an intermediate step to load data using Redshifts ultrafast "Copy from" function. For simplicity, we will assume the source database resides on the hub as well and Oracle has already been installed there.
Before proceeding with this example ensure that the requirements for using HVR with Oracle and Redshift are met.
For information about access privileges and advanced configuration changes required for performing replication using Oracle and Redshift, see:
Create Test Databases and Tables
Create Source Schema
create user sourcedb identified by hvr default tablespace users temporary tablespace temp quota unlimited on users;
Create Tables in Source Schema
create table sourcedb.dm51_product ( prod_id number(10) not null, prod_price number(10,2) not null, prod_descrip varchar2(100) not null, primary key (prod_id) );
create table sourcedb.dm51_order ( prod_id number(10) not null, ord_id number(10) not null, cust_name varchar2(100) not null, cust_addr varchar2(100), primary key (prod_id, ord_id) );
Insert Values in Source Tables
insert into sourcedb.dm51_product values (100, 90, 'Book');
insert into sourcedb.dm51_order values (100, 123, 'Customer1', 'P.O. Box 122, Anytown, Anycountry');
In the AWS portal, create a Redshift database using AWS -> Redshift Cluster -> Launch with cluster identifier hvrcluster and database nametestdb2.
If you don't have a VPC and security group yet, you will create them during step Additional Configuration.
. You can use the defaults to create a minimal cluster or choose a multi node cluster. After creation, you can either create the tables in the cluster using the above .cre and .mod scripts through the SQL Workbench console or let HVR create them during initial loading (HVR Refresh with Create Absent Tables).
In order to use Bulk Refresh and Burst Integrate, HVR requires an additional S3 location to load data from into Redshift using the "Copy from" feature.
In the AWS portal, create an S3 location using AWS -> S3 -> Create Bucket with identifier Copylocation_testdb2. Create it in the same region as your Redshift database.
Install HVR on-premises
An HVR distribution is available for download at the Fivetran.com website. For more information, see Downloading HVR.
Install HVR on a hub machine. For details on installing HVR, see the respective operating system sections:
The HVR distribution requires a license key in order for the software to operate. Please see the HVR licensing page for more details on how to install the HVR license.
After the installation, you can control HVR using the HVR graphical user interface (HVR GUI).
- If the hub machine is Windows, then HVR GUI can be executed directly on the hub machine.
- To control HVR remotely from your PC, connect to the hub machine using Windows Remote Desktop Connection and launch HVR GUI on the hub machine.
- If the hub machine is Linux, then HVR GUI can be executed directly on the hub machine. However, an application like X Server or VNC viewer must be installed to run HVR GUI directly on Linux.
- To control HVR remotely from your PC, install HVR on the PC (with Windows or macOS) and configure the HVR Remote Listener on the hub machine.
- If the hub machine is Unix, then HVR GUI should typically be run remotely from a PC to control HVR installed on the hub machine. To do this, install HVR on the PC (with Windows or macOS) and configure the HVR Remote Listener on the hub machine.
The HVR Remote Listener allows you to connect HVR GUI available on your PC to the remote HVR hub machine. For more information about connecting to remote HVR installation, see Configuring Remote Installation of HVR on Unix or Linux and Configuring Remote Installation of HVR on Windows.
Install HVR remote listener agent using HVR Image for AWS
If you don't have the HVR Image for AWS available, your can obtain it by following the steps in Installing HVR on AWS using HVR Image or do the install manually Installing HVR on AWS Manually.
HVR Image for AWS is a pre-configured Linux-based AMI (Amazon Machine Image) to run HVR's remote listener (as a daemon) including Redshift drivers. HVR Image for AWS supports the use as both a remote HVR agent (for capture and/or integration) as well use as a hub machine using either a PostgreSQL database on the machine, an RDS database, or a database running on a different machine.
HVR Image for AWS is available from the AWS Marketplace and includes all necessary components to connect to Amazon Redshift, Oracle and PostgreSQL on RDS, S3 or any HVR supported target database on EC2, enabling replication from/into all supported on-premises platforms.
HVR Image for AWS is currently available only on Linux. Connectivity to SQL Server requires an HVR installation on Windows (optionally running as an agent).
Using HVR Image for AWS to create a remote listener agent
To use the HVR Image for AWS:
Sign in to the AWS portal and select EC2 under the Services menu.
Click the Launch Instance button.
On the right side menu bar, click AWS Marketplace and type 'HVR' in the search field.
Select the HVR for AWS offering suitable for you.
For optimum efficiency make sure HVR in AMI runs in the same region as the database or service taking part in real-time data movement.
HVR will show you the product details, click Continue.
Under the Choose an Instance Type tab, select a required one and click Next: Configure Instance Details.
If appropriate, under the Configure Instance tab, select a preferred network (VPC) subnet or use the default one.
Under the Add Storage and Add Tags tabs, proceed with default settings.
Under the Configure Security Group tab, set up traffic rules for your instance. HVR uses TPC/IP connectivity on port 4343. Configure an incoming connection for the HVR listener daemon on this port and limit the IP range as narrow as possible. Click Review and Launch.
Under the Review tab, review your instance launch details and click Launch to assign a key pair to your instance and complete the launch process. An EC2 key pair will be used to securely access the AMI. If you have an existing EC2 key pair defined, you can use that. If you don't have a key pair defined, you can create one. Once created, the associated private key file will be downloaded through your web browser to your local computer.
You will now see the AMI being deployed in the Marketplace. This may take a few minutes. The details of the created AMI can be accessed in the AWS EC2 console. Note down the Public DNS name or IP address of the AMI as this will be required when connecting to the AMI.
Once the AMI has been deployed and is running you can start an ssh session to the AMI instance to obtain information on how to proceed. To connect, use the pem key that was used when the image was created. Connect as a user named ec2-user:
ssh -i <key-pair-name>.pem ec2-user@<Public-DNS-name>
The Message of the Day that you see when connecting to the EC2 instance contains important information for using this image. In particular, it provides you with the file location of the SSL public certificate you need for an on-premises HVR installation to connect to the HVR AMI and the location on the AMI from which you can download the HVR software to install on-premises.
Сopy the public certificate .pub_cert file to your on-premises hub machine and download the Linux and Windows HVR packages. To do this, use your preferred file transfer tool, such as sftp, scp or a GUI-equivalent application like WinSCP.
Create the Hub Database
This section describes how to create a hub database (schema). The hub database is a repository database that HVR uses to control its replication activities. It contains HVR catalog tables that hold all specifications of replication such as the names of the replicated databases, the replication direction and the list of tables to be replicated. For more information about HVR hub server and database, see section Hub Server in System Requirements.
HVR supports the creation of a hub database on certain databases (location classes) only. For the list of supported location classes, see section Hub Database in Capabilities.
For this demonstration, the hub database (e.g. hvrhub) is created in Oracle.
Create the hub database (hvrhub) with password (hvr).
create user hvrhub identified by hvr default tablespace users temporary tablespace temp quota unlimited on users;
Connect To Hub Database
This section describes how to connect HVR GUI to the hub database.
When you launch HVR GUI for the first time, the Register Hub dialog is displayed automatically. The Register Hub dialog can also be accessed from menu File by selecting Register Hub. Skip steps 1 to 4 if you want to run HVR GUI directly on the hub machine.
Click Connect to HVR on remote machine.
To connect HVR GUI on a PC to a remote HVR hub machine, the HVR Remote Listener must be configured and running on the HVR hub machine.
Enter the name or IP address of the hub machine in the Node field (e.g. myserver).
Enter the port number (defined in the HVR Remote Listener of the hub machine) in the Port field (e.g. 4343).
Enter the Login (e.g. myserveradmin) and Password for the hub machine. By default, this is the operating system login credentials of the hub machine.
Select Oracle in the Class pane.
Specify Database Connection details.
- Enter the directory path in ORACLE_HOME. You can also click the browse button to select the directory path.
- Enter the Oracle System ID in ORACLE_SID or TNS credentials.
- Enter the user name of the hub database in User (e.g. hvrhub).
- Enter the password for the hub database in Password (e.g. hvr).
Click Connect.
Click Yes in the prompt dialog asking to create catalog tables in the hub database.
HVR displays this prompt when connecting to a hub database for the first time.
On connecting successfully to the hub database, the navigation tree pane displays the hub machine and the hub database. Location Configuration, Channel Definitions, and Scheduler are displayed under the hub database.
Create Oracle and Redshift Locations
Next create the source location using right-click on Location Configuration ▶ New Location.
For the source, enter the credentials of the Oracle database testdb1 created earlier. The corresponding HVR location will be called db1:
In this example there is no need to check Connect to HVR on remote machine because testdb1 is on the same machine as the hub.
Ignore the Group Membership tab for now.
Create another location for the Redshift database testdb2 too and call it db2. Now do tick Connect to HVR on remote machine, because HVR needs to connect to the HVR remote listener agent installed before. Fill in the details of the VM running the remote listener there. Use operating system user hvr with any password. Choose Redshift as database and fill in the Redshift Connection info.
Create Location Groups
The channel needs two location groups. Under the new channel, right-click Location Groups ▶ New Group. Enter a group name (for instance CENTRAL).
Add location db1 as a member of this group by checking the box for db1.
Then create a second location group, called DECENTRAL that has members db2 and db3.
The new channel also needs a list of tables to replicate. This can be done as follows: right-click Tables ▶ Table Explore.
- Choose the first of the three locations ▶ Connect.
- In the Table Explore window, click on both tables and click Add.
- In new dialog HVR Table Name click OK.
- Close the Table Explore window.
- Perform table select again on one of the other locations and confirm that all tables to be replicated have value Same in column Match.
Define Actions
The new channel needs two actions to indicate the direction of replication and a third action for the S3 copy location needed for the "Copy from" feature:
- Right-click group CENTRAL ▶ New Action ▶ Capture.
- Right-click Group DECENTRAL ▶ New Action ▶ Integrate. Check /Burst, this enables HVR to use the "Copy from" feature of Redshift which will increase significantly the integrate speed.
- Right-click Group DECENTRAL ▶ New Action ▶ LocationProperties. Check both /StagingDirectoryHvr and /StagingDirectoryDb with argument s3://Copylocation_testdb2/. Check /StagingDirectoryCredentials with argument 'aws_access_key_id=<_key>;aws_secret_access_key=<secret_key>'_.
These keys are created when your AWS account was created. You can manage these keys for specific users at AWS -> Identity & Access Management -> Users -> Security Credentials The keys should belong to a user having full S3 access - e.g. the user that created the bucket in the first place.
Note that the Actions pane only displays actions related to the objects selected in the left-hand pane. So click on channel hvr_demo01 to see all three actions.
Perform Initial Loading and Table Creation
HVR Refresh copies the data from one location to another location and optionally creates missing or mismatched tables and keys. In the navigation tree pane, right-click the channel and select HVR Refresh.
For the source select location db1 and for target select location check db2. Check the options Create Absent Tables, With Key, Recreate Mismatched Tables and click Refresh.
Enable Replication with HVR Initialize
Now that the channel definition is complete, create the runtime replication system.
- Right-click channel hvr_demo01 ▶ HVR Initialize.
- Choose Create or Replace Objects and click HVR Initialize.
From the moment that HVR Initialize is done, all changes to database sourcedb will be captured by HVR when its capture job looks inside the logging.
HVR initialize also creates three replication jobs, which can be seen under the Scheduler node in the GUI.
Start Scheduling of Replication Jobs
Start the Scheduler on the hub machine by clicking in the HVR GUI on the Scheduler node of the hub database.
Next, instruct the HVR Scheduler to trigger the replication jobs.
The replication jobs inside the Scheduler each execute a script under $HVR_CONFIG/job/hvrhub/hvr_demo01 that has the same name as the job. So job hvr_demo01–cap–db1 detects changes on database testdb1 and stores these as transactions files on the hub machine.
The other job (hvr_demo01–integ–db2) picka up these transaction files and perform inserts, updates and deletes on the target database.
Test Replication
To test replication, make a change in testdb1:
testdb1/hvr SQL> insert into dm01_product values (1, 19.99, 'DVD'); SQL> commit;
In the HVR log file you can see the output of the jobs by clicking on View Log. This log file can be found in $HVR_CONFIG/log/hubdb/hvr_demo01–cap–db1.
The job output looks like this:
hvr_demo01-cap-db1: Scanned 1 transaction containing 1 row (1 ins) for 1 table. hvr_demo01-cap-db1: Routed 215 bytes (compression=40.6%) from 'db1' into 1 location. hvr_demo01-cap-db1: Capture cycle 1. hvr_demo01-integ-db2: Integrate cycle 2 for 1 transaction file (215 bytes). hvr_demo01-integ-db2: Integrated 1 change from 'dm01_product' (1 ins). hvr_demo01-integ-db2: Integrate used 1 transaction and took 0.004 seconds. hvr_demo01-integ-db2: Waiting...
This indicates that the jobs replicated the original change to testdb2. A query on testdb2 confirms this:
SQL> select * from dm01_product;
prod_id | prod_price | prod_descrip |
1 | 19.99 | DVD |
HVR Compare and Refresh
HVR Compare checks whether two locations have identical rows, and HVR Refresh copies the content of one location to the second location. In the HVR GUI, right–click on a channel ▶ HVR Compare (or HVR Refresh). Choose two locations by clicking on the Select buttons.
The outcome of the comparison is displayed below;