How Can Row-Wise Compare Say Table Is Identical but Bulk Checksum Say It Is Different?
Issue
Row-by-Row Compare say a table is identical, but Bulk Compare using checksums say it's different.
Environment
HVR 5
Resolution
There are various reasons why bulk compare could say a table is different, but row by row compare says it's the same.
- 'Noise'. Unused bytes in the HVR transport format could cause a different checksum, even though the values are identical.
- Float data types are sometimes 'lossy'. Is 0 the same as 1E-200? Row by row compare uses a matching algorithm with a tolerance for float rounding inaccuracies.
- Other coercion errors.
Do the following to troubleshoot such problems.
When such a problem is detected it is important to get a specific test case back to HVR technical support.
- First, make a channel with only the specific table.
- 'Chop' it down to the key columns and a column with the false difference. Columns can be chopped either by removing them in the GUI or by adding ColumnProperties/IgnoreDuringCompare.
- Keep doing that until the number of columns has been minimized.
- Finally, chop down the number of rows. This can be done be defining action {Group=* Table=tab1 Restrict /RefreshCondition="{k} <= {hvr_var_k_min} and {hvr} >= {hvr_var_k_max}" /Context=chop}.
- Use HVR Compare with the 'chop' context enabled (in the Contexts tab) and experiment with min and max values until the difference is localized.
- Send a dump of the bad rows and the channel definition to HVR Technical Support.
Cause
Conceptually, the bulk compare looks at the raw bytes as HVR transports them over the network; it just checks if they are the same. The row-wise compare unpacks these bytes into the correct datatype and then compares them; it checks which value is bigger (not just if they are the same).
There could be a difference between row-wise and bulk compare, but this would be an HVR bug. The problem would be that bulk compare says a table is different, whereas row-by-row compare says it is the same. The opposite problem (bulk compare says same, whereas row-by-row compare says different) has not been seen before and would be weird.