This documentation is for an older version of HVR.

How to Use Online Refresh to Correct Data Mismatches

Question

How can I use Online Refresh to correct data mismatches?

Environment

HVR 5

Answer

This refresh procedure is used for a specific table or tables that have data consistency problems.

Full Load for Small or Medium-Sized Tables

Open the Refresh dialog.
Select the source and target location(s).
Select the tables that need to be refreshed.
Select the level of parallelism for the tables.
Select the Online Refresh option and set it to Skip Previous Integrate. This is because we assume two integrate locations.
Select the Schedule box and choose a sensible name for the refresh task.
Click Schedule.
Suspend the integrate jobs. You can leave the capture jobs running.
(Optionally) Go to the Scheduler, select the refresh job that has just been created, and choose New Attribute.
Create a retry_max attribute and set this to the required value. 0 means no retry.
Right-click the refresh job and click Start to run the job.

Selective Refresh for Large Tables

Open the Refresh dialog.
Select the source and target location(s).
Select the tables that need to be refreshed.
Select the applicable context that activates the Restrict /RefreshCondition.
Select the level of parallelism for the tables (only applicable if multiple tables are refreshed).
Select the Online Refresh option and set it to Only Resilience. This is because only a slice of the tables is being refreshed.
Select the Schedule box and choose a sensible name for the refresh task.
Click Schedule.
Suspend the integrate jobs. You can leave the capture jobs running.
(Optionally) Go to the Scheduler, select the refresh job that has just been created, and choose New Attribute. Create a retry_max attribute and set this to the desired value. 0 means no retry.
Right-click the refresh job and click Start to run the job.

Refreshes that use Restrict /RefreshCondition almost always use Only Resilience. The only exception is when the /RefreshCondition will match /CaptureCondition or /Integrate Condition that prevents some datasets from being replicated. Selective refreshes to correct for missed data should always be done with Only Resilience.

For selective refreshes, to correct for missed updates, with conditions on, for example, last_update date; duplicates can occur after the refresh because the old row will still be in there. These duplicates need to be (hard) deleted manually.

Example of Restrict /RefreshCondition

The following is an example of Restrict /RefreshCondition based upon the creation_date and last_update_date columns, where creation_date stands for the date when the row was inserted and last_update_date stands for the last date when the row was updated.

Using creation_date is applicable when inserts are missed; using last_update_date is applicable when updates are missed. A context parameter is added /Context=<ctx_name>. This allows the creation of specific refresh tasks with different /RefreshCondition applied.

Example based on creation_date

Source table:

Target table:

The row where CREATION_DATE = 2019-01-02 is missing from the target; therefore, we have to give this date as the condition of the refresh. The syntax of the condition depends on the source and target database. In this case, both the source and target databases are Oracle:
The condition has to be applied to both location groups (source and target); this way, the row(s) where CREATION_DATE = 2019-01-02 will be selected from the source table, and the rows where CREATION_DATE <> 2019-01-02 won’t be deleted from the target table.
Schedule and start the refresh:
a. Open the HVR Refresh dialog and select Bulk Granularity and Only Resilience.
b. Click the Scheduling tab and select the Schedule Classic Job option. Specify the required Taskname.
c. Click the Contexts tab and select the missing_inserts context and click the Schedule button.
After scheduling the job, start it and wait until it is finished:
The target table after the refresh:
Source table:
Target table:

Example based on last_update_date

Source table:

Target table:

For some reason, an update was not replicated to the target table, which is visible from the difference in the last_update column of the row where C1 = 2.

To correct this, we have to do a refresh, but first, the differing row has to be deleted from the target. If we do not delete it and we create a condition based on the source table’s last_update_date column’s value, then the refresh job will not be able to locate the row in the target (since the target last_update_date value differs) and the target row where C1 = 2 will not be deleted before the row with the correct values is inserted. This is important especially if there are key/unique columns in the target, because this scenario will lead to a violation of the key/unique constraint.

Delete the different row from the target:
Source table:
Target table:
Create the refresh condition:
Schedule and start the refresh (the steps are detailed in the previous example’s second step).
After the refresh is finished, the previously deleted row is inserted with the correct last_update_date column value:
Source table:
Target table:

On the source Restrict /RefreshCondition=”creation_date>to_date('15-MAR-18','dd-mon-yy') and creation_date<to_date('20-MAR-18','dd-mon-yy')” /Context=create_date

On the target Restrict /RefreshCondition=”creation_date > '2018-03-15' and creation_date < '2018-03-20' and hvr_is_deleted=0” /Context=create_date