FileFormat

Action FileFormat can be used on file locations (including HDFS and S3) and on Kafka locations.

For file locations, it controls how Fivetran HVR reads and writes files. The default format for file locations is our own XML format.
For Kafka, it controls the format of each message. By default, the Kafka location sends messages to the Kafka broker in JSON format, unless the location property Kafka_Schema_Registry is defined, in which case each message uses Kafka Connect's compact Avro-based format. Note that this is not a true Avro because each message would not be a valid Avro file (e.g., no file header). Rather, each message is a 'micro Avro', containing fragments of data encoded using Avro's data type serialization format. Both JSON (using mode SCHEMA_PAYLOAD, see parameter JsonMode below) and the 'micro AVRO' format conform to Confluent's 'Kafka Connect' message format standard. The default Kafka message format can be overridden by parameter such as Xml, Csv, Avro, Json or Parquet.

A custom format can be defined by using parameters CaptureConverter or IntegrateConverter. Many parameters only have effect if the channel contains table information; for a 'blob file channel' the jobs do not need to understand the file format.

If this action is defined on a specific table, then it affects all tables in the same location.
Defining more than one file format (Xml, Csv, Avro, Json or Parquet) for the same file location using this action is not supported, i.e., defining different file formats for each table in the same location is not possible. For example, if one table has the file format defined as Xml then another table in the same location cannot have Csv file format defined.

Parameters

This section describes the parameters available for action FileFormat.

Following are the two tabs/ways, which you can use for defining action parameters in this dialog:

Regular: Allows you to define the required parameters by using the UI elements like checkbox and text field.
Text: Allows you to define the required parameters by specifying them in the text field. You can also copy-paste the action definitions from Fivetran HVR documentation, emails, or demo notes.

Xml

Description: Read and write files as HVR's XML format. This parameter is applicable only for the channels with table information; not a 'blob file'.

Csv

Description: Read and write files as Comma-separated values (CSV) format. This parameter is applicable only for the channels with table information; not a'blob file'.

Avro

Description: Transforms the captured rows into Avro format during Integrate.

An Avro file contains the schema defining data types in JSON and a compact binary representation of the data. See Apache Avro documentation for the detailed description of schema definition and data representation.

Expand for more information

Avro supports both primitive and logical data types. The normal way to represent Avro file in human-readable format is by converting it to JSON using the Apache Avro tools. However, there is a limitation in representing decimal values using standard Avro tools utilities. The decimal type in Avro is supported as a logical type and is defined in the Avro schema file as follows:

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": precision,
  "scale": scale
}

Where, precision is the total number of digits in the number and scale is the number of digits after the decimal point.

The decimal logical type represents an arbitrary-precision signed decimal number of the form unscaled × 10-scale. For example, value 1.01 with a precision of 3 and scale of 2, is represented as 101.

The decimal values are encoded as a sequence of bytes in Avro. In their JSON representation, decimal values are displayed as an unreadable string instead of human-readable values.

When using Hive (Hive external table) to read Avro files, the decimal data type is displayed properly.

For example:

A source table is defined as follows:

CREATE TABLE dec (c1 NUMBER(10,2), c2 NUMBER(10,4));

where, the column c1 stores a decimal value with precision 10 and scale 2, and the column c2 stores a decimal value with precision 10 and scale 4.

If we insert values (1, 1) into the dec table and select them from the table, we expect to see (1, 1) as an output. But Avro format uses the specified scales and represents them in binary format as 100 (1.00) in column c1 and 10000 (1.0000) in column c2. According to the JSON specification, a binary array is encoded as a string. JSON will display these values as "d" (wherein "d" is 100 according to ASCII ) and "'\x10" (wherein 10000 is 0x2710, and 0x27 is ' according to the ASCII encoding).

Formats like Parquet with ParquetVersion=v2 or v3, Json with JsonMode=SCHEMA_PAYLOAD uses the same rules to encode decimal data types.

Json

Description: Transforms the captured rows into JSON format during Integrate. The content of the file depends on the value for parameter JsonMode.

Parquet

Description: Transforms the captured rows into Parquet format during Integrate.

Compact

Description: Write compact XML tags like <r> & <c> instead of <row> and <column>. This parameter can be used only if Xml is selected.

Compress

Argument: algorithm_

Description: HVR will compress files while writing them, and uncompress them while reading.

Available options for algorithm are:

GZIP
LZ4

The file suffix is ignored but when integrated, a suffix can be added to the files by defining action Integrate with parameter RenameExpression="{hvr_cap_filename}.gz".

This parameter is not supported for Avro and Parquet.

Encoding

Argument: encoding

Description: Encoding for reading or writing files.

Available options for encoding are:

US-ASCII
ISO-8859-1
ISO-8859-9
WINDOWS-1251
WINDOWS-1252
UTF-8
UTF-16LE
UTF-16BE

HeaderLine

Description: First line of CSV file contains column names.

FieldSeparator

Argument: str_esc

Description: Field separator for CSV files.

The default value for this parameter is comma (,).

Note that only a single Unicode glyph is supported as a separator for this parameter.

Examples: , \x1f or \t.