EBCDIC Encoded Column in Db2 for i Is Not Replicated Correctly
Issue
EBCDIC/IBM037-encoded columns, such as CHAR or BINARY columns, in Db2 for i don't match the values in the destination even after the columns are configured as BINARY in the target.
Environment
- HVR 5 and HVR 6
- Source: IBM Db2 for i
- Target: BigQuery
Resolution
To resolve this issue, configure HVR to treat the EBCDIC-encoded columns as BINARY data throughout the pipeline:
- On location
*, add a ColumnProperties action for each affected column, such aspcn1andamt1.- Set DataType=Binary
- Set Length= to an appropriate length, for example, 30.
- This ensures the column is created as BYTES in BigQuery.
- For HVR refresh jobs, add a CaptureExpression action on the source for each affected column. Use a dedicated refresh context, such as
Context=refr.- Use the SQL expression:
CAST(<column> AS BINARY(<length>)) - When running the refresh job, select the custom context to prevent incorrect encoding conversions.
- Use the SQL expression:
- Validate the result by comparing HEX values only after confirming that both the source and target treat the data as BINARY. Do not compare HEX values directly between a Db2 EBCDIC source and a UTF-8 or Unicode destination.
If values still differ, verify that each HVR action is configured at the correct scope, such as source, target, or location *.
For more information about character encodings supported in HVR, see Supported Character Encodings.
Cause
This issue occurs when HVR applies character set conversion to EBCDIC-encoded column data from Db2 for i. HVR interprets the data as character data instead of raw binary data. Without the correct ColumnProperties and CaptureExpression configuration, this conversion can produce incorrect values in the destination.