Error: DELTA_NOT_NULL_CONSTRAINT_VIOLATED NOT NULL Constraint Violated for Column
Issue
A Databricks Refresh operation in HVR fails with the following error:
[DELTA_NOT_NULL_CONSTRAINT_VIOLATED] NOT NULL constraint violated for column: [<column_name>].
Environment
- HVR 6
- Target: Databricks (Delta Lake)
- Databricks staging: Data is staged as CSV files in a cloud storage location, such as S3 or Azure Data Lake Store Gen2.
- Condition: A target column is defined as
NOT NULLbut receives anullvalue from the CSV file.
Resolution
To troubleshoot this issue, follow these two phases:
Phase 1: Diagnostic and isolation steps
Add the HVR action
HVR_STAGING_KEEP=1to the TARGET location group environment.Rerun the Refresh operation for the affected table. It will fail, but the CSV staging files will remain in the staging directory.
(Optional but recommended) Prepare a test channel:
i. Export and import the channel to a test environment.
ii. Replace customer-specific locations and schemas with your own test locations and schemas.
iii. Enable SQL tracing on the target location:
HVR_SQL_TRACE=1.iv. Run a test refresh with the Create Target Tables (always recreate) option selected to generate the necessary setup and logs.
Locate and copy the complete
COPY INTOstatement from the HVR trace or log files. This statement shows the column mappings. For example,_cXtocolumn_name.Identify the failing file:
i. Access the cloud storage location specified in the
COPY INTOstatement, such as the Azure portal or the S3 console.ii. Upload the refresh CSV files one by one to the target directory.
iii. After each upload, run the captured
COPY INTOstatement in the Databricks SQL Editor, removing any trailing backslashes (\).iv. The file that generates the
[DELTA_NOT_NULL_CONSTRAINT_VIOLATED]error is the problematic file.(Optional) Identify the failing rows:
i. Modify the
COPY INTOstatement for the failing column to replacenullvalues with a placeholder value, such as'ZZ'. For example, for the columnfield04mapped to source column_c101:- Original:
_c101::string field04 - Modified:
coalesce(_c101,'ZZ') as field04
ii. Rerun the modified
COPY INTOstatement.iii. Query the Databricks table to find the rows that caused the constraint violation:
SELECT * FROM `your_schema`.`item` WHERE field04 = 'ZZ';- Original:
Phase 2: Permanent workaround in HVR
To prevent this issue, force HVR to treat the null or missing value in the source data as an empty string ('') before it is loaded into the Databricks staging CSV file, satisfying the NOT NULL constraint.
- In the HVR channel, add a ColumnProperties action on the source location.
- Set the
IntegrateExpressionparameter to use theCOALESCEfunction for the specific column. For example, for the columnfield04:IntegrateExpression:COALESCE({field04},'')
- Run a new Refresh operation or perform a normal Capture or Integrate cycle. The issue will no longer occur for this column.
Cause
This issue occurs when the HVR default CSV loading process for Refresh operations interprets missing or explicitly defined NULL values in the staging CSV files, often represented as \N, as actual null. Because the corresponding target Delta Lake table column has a NOT NULL constraint, the insertion fails.