How to Identify Data Loss and Inconsistencies
Question
How can I identify data loss and inconsistencies?
Environment
HVR 5
Answer
For HVR 5.6.0, we recommend that you use Online Compare to identify data loss and inconsistencies.
For the older versions of HVR, HVR Compare compares the data in the tables on source and target locations. It uses the channel definition as the basis for comparison, including any transformations that may be defined in the channel. It also takes the data type mappings between source and destination into consideration.
HVR Compare provides two modes:
Bulk: The bulk mode computes the checksum of all data in a table. If the checksum is different, HVR reports a difference between the source and destination tables, irrespective of whether the also-reported row count is identical. The bulk mode runs on the database server and only passes the checksum to the hub.
Row-by-row: In the row-by-row mode, all data is sorted and passed to the hub for detailed comparison. Row-by-row comparison in verbose mode provides a detailed report of all differences, including a set of SQL statements to run on the destination of the compare operation. This brings its system back in sync with the source system.
You can run the compare jobs interactively or schedule them in the HVR job scheduler. When you schedule a compare job, separate log files are created for the job and all the output is sent to the hvr.out
log file. The compare job does not take in-flight transactions into consideration. So, you can expect some differences on a live system. Therefore, we recommend that you run the compare jobs during down-time or when the system is relatively idle.
Instructions
To identify tables that have data consistency issues, do the following:
Run HVR Compare in bulk mode for all tables.
For the tables with differences, do the following:
i. Run HVR Compare in row-by-row mode and make a note of the number of reported differences in a spreadsheet.
ii. Repeat Step i. multiple times for the tables that have differences. If the row-count of the differences varies with a low minimum (for example, 100, 10, 0, 1, or 0 on subsequent runs), it indicates that the differences are due to in-flight transactions. If the number varies with high minimum (for example, 10100, 10010, or 10000 on subsequent runs), it indicates that the table has missed data.
For the tables with data loss, do one of the following:
Run HVR Compare in row-by-row verbose mode to get the SQL statements that you can use to correct the data differences on the target.
Run HVR Refresh in online mode for these tables.