Fivetran HVR supports integrating changes into HDFS location. This section describes the configuration requirements for integrating changes using Integrate and Refresh into HDFS location.
Customize Integrate
Defining action Integrate is sufficient for integrating changes into an HDFS location. However, the default file format written into a target file location is HVR's own XML format and the changes captured from multiple tables are integrated as files into one directory. The integrated files are named using the integrate timestamp.
You may define other actions for customizing the default behavior of integration mentioned above. Following are few examples that can be used for customizing integration into the HDFS location:
To segregate and name the files integrated into the target location, define parameter RenameExpression.
For example, if RenameExpression={hvr_tbl_name}/{hvr_integ_tstamp}.csv is defined, then for each table in the source, a separate folder (with the same name as the table name) is created in the target location, and the files replicated for each table are saved into these folders. This also enforces unique name for the files by naming them with a timestamp of the moment when the file was integrated into the target location.
This action defines properties for a column being replicated. This action may be defined to:
integrate the delete operation. By default, for file-based target locations, HVR does not replicate the delete operation performed at the source location. So to integrate the delete operation, an extra column for timekey needs to be added in the target location. For this, action ColumnProperties may be defined with the following parameters:
Name: This parameter defines the name for the extra column in the target location.
Extra: This parameter defines that this is an extra column in the target location (a column which is not present in the source location).
IntegrateExpression: This parameter defines the expression to be used for generating the timekey value. For example, {hvr_integ_seq} can be used here. This is a 36 byte string value (hex characters) which is unique and continuously increasing for a specific source location.
TimeKey: This parameter defines that this is a timekey column.
Datatype=varchar: This parameter defines the data type for the extra column.
Length=36: This parameter defines the data type length for the extra column.
add the source operation type (using hvr_op) information in the target location. This action definition is required for performing Compare if action ColumnProperties with parameter TimeKey is defined on a target file location. For this, action ColumnProperties may be defined with the following parameters:
Name: This parameter defines the name for the extra column in the target location.
Extra: This parameter defines that this is an extra column in the target location (a column which is not present in the source location).
IntegrateExpression={hvr_op}: This parameter defines the expression to be used for generating the information about source operation type.
Datatype=integer: This parameter defines the data type for this extra column.
Integrate Limitations
By default, for file-based target locations, HVR does not replicate the delete operation performed at the source location.