Apache HDFS as Target
Fivetran HVR supports integrating changes into HDFS location. This section describes the configuration requirements for integrating changes using Integrate and Refresh into HDFS location.
Customize Integrate
Defining action Integrate is sufficient for integrating changes into an HDFS location. However, the default file format written into a target file location is HVR's own XML format and the changes captured from multiple tables are integrated as files into one directory. The integrated files are named using the integrate timestamp.
You may define other actions for customizing the default behavior of integration mentioned above. Following are few examples that can be used for customizing integration into the HDFS location:
Group | Table | Action | Annotation |
---|---|---|---|
HDFS | * | FileFormat | This action may be defined to:
|
HDFS | * | Integrate | To segregate and name the files integrated into the target location, define parameter RenameExpression. For example, if RenameExpression={hvr_tbl_name}/{hvr_integ_tstamp}.csv is defined, then for each table in the source, a separate folder (with the same name as the table name) is created in the target location, and the files replicated for each table are saved into these folders. This also enforces unique name for the files by naming them with a timestamp of the moment when the file was integrated into the target location. |
HDFS | * | ColumnProperties | This action defines properties for a column being replicated. This action may be defined to:
|
State Directory
By default, HVR creates its internal state files in a sub-directory named _hvr_state within the location’s top directory.
This option in HVR UI allows you to specify a custom path for HVR’s internal state files, which are used during file replication. The state directory can be configured as a path within the location’s top directory or placed outside of it. If a relative path (e.g., ../work) is specified, it will be interpreted relative to the location’s top directory.
If the state directory is on the same file system as the location’s top directory, HVR ensures that file move operations are ‘atomic,’ so users only see fully written files and never partial versions.
This option is equivalent to the location property File_State_Directory.
Intermediate Directory
This option in the HVR UI allows you to specify a directory path for storing intermediate (temporary) files generated during Compare. These files are created during both "direct file compare" and "online compare" operations.
Using an intermediate directory can enhance performance by ensuring that temporary files are stored in a location optimized for the system's data processing needs.
This setting is particularly relevant for target file locations, as it determines where the intermediate files are placed during the Compare operation. If this option is not enabled, the intermediate files are stored by default in the integratedir/_hvr_intermediate directory, where integratedir is the replication DIRECTORY (File_Path) defined for the target file location.
This option is equivalent to the location property Intermediate_Directory.
Integrate Limitations
By default, for file-based target locations, HVR does not replicate the delete
operation performed at the source location.