Analyzing Diff File
A diff file is a file that stores the differences between source and target locations detected during HVR compare. The file is created for each table in which differences are found.
The diff file is only generated during an event driven row-by-row HVR Compare with the verbose option when the differences are detected.
To generate a diff file in the GUI, the following options must be selected:
- Row by Row Granularity and Verbose under the Options tab in the HVR Compare dialog.
- Generate Compare Event under the Scheduling tab in the HVR Compare dialog.
Viewing Diff File
The generated diff file is stored in the $HVR_CONFIG/diff/hub/channel/ directory. The name of the diff file includes the channel name, source location name, target location name, the timestamp of when it was created, and the table name in which differences were found. For example, the diff file name can be mychannel-cmp-icsrc-ictgt_2022-04-13_07-31-02_702Z_dm51_product_zRsfn.diff.
When performing the event driven HVR Compare in the GUI, the name of the generated diff file is displayed in the DIFF FILE column on the Results pane of the Insight Events web page, which opens automatically after the compare event is completed.
The diff file can be viewed using command Hvrrouterview.
For example, run the following command to open diff file mychannel-cmp-icsrc-ictgt_2022-04-13_07-31-02_702Z_dm51_product_zRsfn.diff stored in the /home/hvr/hvr_config/myhub/mychannel/ directory.
hvrrouterview myhub mychannel /home/hvr/hvr_config/myhub/mychannel/mychannel-cmp-icsrc-ictgt_2022-04-13_07-31-02_702Z_dm51_product_zRsfn.diff
Sample output of the above command:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE hvr_private SYSTEM "lib/hvr_private.dtd"> <hvr_private version="1.0"> <table name="dm51_product"> <row> <column name="hvr_op">4</column> <column name="hvr_diff_mask">----=!=</column> <column name="hvr_diff_row_num_left">66</column> <column name="hvr_diff_row_num_right">66</column> <column name="prod_id">66</column> <column name="prod_price">5</column> <column name="prod_descrip">Description for product</column> </row> <row> <column name="hvr_op">2</column> <column name="hvr_diff_mask">----=!=</column> <column name="hvr_diff_row_num_left">66</column> <column name="hvr_diff_row_num_right">66</column> <column name="prod_id">66</column> <column name="prod_price">6</column> <column name="prod_descrip">Description for product</column> </row> </table> </hvr_private>
Interpreting Diff File
The diff file contains rows that specify which operations (update, insert, or delete) need to be applied on a target location to bring it in sync with a source location.
Each row in a diff file contains:
- system columns hvr_op, hvr_diff_mask, hvr_diff_row_num_left and hvr_diff_row_num_right. These columns will help you read and interpret the diff file contents.
- columns from the actual tables on the source and target locations, in which the differences are detected, e.g. prod_id, prod_price, prod_descrip.
Column hvr_op
Column hvr_op defines the HVR operation to apply against a target table. Column hvr_op can contain the following values.
hvr_op | Description |
---|---|
0 | Delete on target (existing row on target) |
1 | Insert on target (existing row on source) |
2 | Update on target (existing row on source, new value for target) |
4 | Appears before the row with hvr_op=2. Defines the existing value on target, which is out of sync. |
6 | Appears before the row with hvr_op=0 or hvr_op=1. Defines a previous value on the location (source or target) opposite to that shown in the row with hvr_op=0 or hvr_op=1. Explains why insert or delete is required. |
Column hvr_diff_mask
Column hvr_diff_mask shows which columns are identical and which columns are different. It contains a set of characters (shown in the table below), one character for each column. Column hvr_diff_mask should be considered in combination with column hvr_op.
Column hvr_diff_mask can contain the following characters.
hvr_diff_mask | Description |
---|---|
- | Column is not relevant. |
= | Column is identical between source and target. |
! | Column is different between source and target. |
> < | Appears on all columns if one side (source or target) finishes early. All remaining rows will be inserts or deletes. Similar to hvr_op=6: explains why an insert/delete is required. |
& { } | Inconclusive difference, only during the diff_diff online compare. Similar to '!' , but there is no indication if the rows are identical or different.
|
The table below demonstrates different values that can appear in the hvr_diff_mask column.
For example, value "----==!" indicates the following:
- "----" defines the first four system columns (hvr_op, hvr_diff_mask, hvr_diff_row_num_left, hvr_diff_row_num_right)
- "==" defines the table columns which are equal between source and target
- "!" defines the seventh column in this example that differs in source and target
The first four characters are normally “-” because they are the "hvr_" columns.
For updates, key columns will always be identical: “=”. For inserts/deletes, the key columns will be different ("!") (see hvr_op=6 followed by hvr_op=0 in the above image) and non-key columns will be irrelevant ("-") because they cannot be compared if they belong to different keys.
Non-key columns can be identical “=” or different “!".
Columns hvr_diff_row_num_left and hvr_diff_row_num_right
During a compare event, HVR runs the query that selects all records in a table and orders them by all key columns (SELECT...ORDER BY
), on both source and target locations. Columns hvr_diff_row_num_left and hvr_diff_row_num_right contain a row number of the query results in which the difference is detected.
Column hvr_diff_row_num_left defines the row number on the source
Column hvr_diff_row_num_right defines the row number on the target
hvr_op=0 | If hvr_op=0 (delete), column hvr_diff_row_num_right will be > 0; this means that the row data is on target. |
hvr_op=1 | If hvr_op=1 (insert), column hvr_diff_row_num_left will be > 0 ; this means that the row data is on source. |
hvr_op=6 | If hvr_op=6, either of hvr_diff_row_num_left or hvr_diff_row_num_right will be > 0 It will be the opposite of its pair hvr_op=0/1 row |
hvr_op=4 hvr_op=2 | If hvr_op=2 (update), both hvr_diff_row_num_left or hvr_diff_row_num_right will be > 0. In hvr_op=4 and hvr_op=2 pair, both hvr_diff_row_num_left or hvr_diff_row_num_right will have the same values. |
For example, the following piece of diff file indicates that row 1 on target ( hvr_diff_row_num_right=1) containing prod_id=1, prod_price=80 and prod_descrip=Book needs to be deleted (hvr_op=0).
<row> <column name="hvr_op">0</column> <column name="hvr_diff_mask">----!--</column> <column name="hvr_diff_row_num_left">0</column> <column name="hvr_diff_row_num_right">1</column> <column name="prod_id">1</column> <column name="prod_price">80</column> <column name="prod_descrip">Book</column> </row>
The following piece of diff file shows that there is a difference between source and target in row 10. The existing row 10 on target (hvr_op=4) with values prod_id=8, prod_price=81, and prod_descrip=Pencil needs to be updated (hvr_op=2) with values prod_id=8, prod_price=80, and prod_descrip=Book present in row 10 on source (hvr_diff_row_num_left=10).
<table name="dm51_product"> <row> <column name="hvr_op">4</column> <column name="hvr_diff_mask">----=!=</column> <column name="hvr_diff_row_num_left">10</column> <column name="hvr_diff_row_num_right">10</column> <column name="prod_id">8</column> <column name="prod_price">81</column> <column name="prod_descrip">Pencil</column> </row> <row> <column name="hvr_op">2</column> <column name="hvr_diff_mask">----=!=</column> <column name="hvr_diff_row_num_left">10</column> <column name="hvr_diff_row_num_right">10</column> <column name="prod_id">8</column> <column name="prod_price">80</column> <column name="prod_descrip">Book</column> </row> </table></div>
Rows Which Differ
The difference that requires updating the target row(s) appears only for non-key columns. HVR detects a row with the same key columns in source and target and one or more non-key columns having different values.
The difference is always indicated by a pair: hvr_op=4 followed by hvr_op=2.
- hvr_op=4: Existing row value on target
- hvr_op=2: New row value for target. This is what HVR detects on the source.
Column hvr_diff_mask shows which columns are identical and which columns are different.
Key columns are always identical: “=”
Non-key columns can be identical “=” or different “!”
The table on the right shows three differences that require updates:
- Blue: Row with prod_id=3100 that has prod_descrip=x3100 on target must be updated as 3100 to be in sync with the source. Column prod_price is identical on source and target.
- Similar updates must be performed on the green and red rows.
Rows Only on Target
The difference that requires deleting the target row appears when HVR detects a row on the target but does not find it on the source.
This difference may appear in a group: hvr_op=6, hvr_op=0, hvr_op=0, …
- hvr_op=6 is informational and indicates why the row(s) need to be deleted.
- hvr_op=0 is the row to be deleted on target.
Column hvr_diff_mask shows which key columns are different.
The differences can be viewed as hvr_op=6 hvr_op=0, hvr_op=6 hvr_op=0 pairs (similar to hvr_op=4 hvr_op=2); except that all hvr_op=6 are combined to a single row.
The table on the right shows differences that require deleting rows on target.
HVR detected row prod_id to jump from 460 to 1000 on the source, but on the target, it detected rows with 461, 462, and 463.
These rows are extra on target and must be deleted (hvr_op=0). The reason is that the row with hvr_op=6 detected on source (hvr_diff_row_num_left=462) with a difference (hvr_diff_mask=!) in the prod_id key column.
Rows Only on Source
These are the opposite of 'Rows Only on Target'.
HVR finds a row on the source but does not see it on target.
This difference always appears in a group: hvr_op=6, hvr_op=1, hvr_op=1, …
- hvr_op=6 is informational and indicates why the row(s) needs to be inserted. However, in this case, it shows the row on target (hvr_diff_row_num_right > 0)
- hvr_op=1 are the rows to be inserted on target. These rows are on the source but not yet on target.
Deletes or Inserts at the End of Diff File
This difference will appear only at end of the diff file indicating the rows to be inserted or deleted. There is no hvr_op=6. Instead, column hvr_diff_mask contains all “<<<<<<<” or all “>>>>>>>”.
It means that during the SELECT...ORDER BY
query, one side (source or target) ran out of data before the other side. Therefore, all remaining rows are to be deleted or inserted depending on which side ran out of data.
The diagram on the right demonstrates this.
- hvr_op=1 indicates rows to be inserted on the target. They exist on the source.
hvr_diff_mask will be ">>>>>>>". - hvr_op=0 indicates rows to be deleted from the target. They do not exist on the source.
hvr_diff_mask will be "<<<<<<<".