Apache Kafka
Fivetran supports Apache Kafka as a destination.
Apache Kafka is an open-source distributed streaming platform for building real-time data pipelines. The Apache Kafka platform is based on a persistent, append-only, and publish-subscribe log system that captures and moves real-time data and events.
Supported implementations
Fivetran supports connecting with the following Kafka implementations:
Type transformation mapping
The data types in your Apache Kafka destination follow Fivetran's standard data type storage.
We use the following data type conversions:
Fivetran Data Type | Destination Data Type |
---|---|
BOOLEAN | BOOLEAN |
INT | INT |
LONG | LONG |
BIGDECIMAL | DECIMAL |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
LOCALDATE | DATE |
LOCALDATETIME | TIMESTAMP-MILLIS |
INSTANT | TIMESTAMP-MILLIS |
STRING | STRING |
XML | STRING |
JSON | STRING |
BINARY | STRING |
Setup guide
Follow our step-by-step setup guides for specific instructions on how to set up Apache Kafka as a destination:
Data load costs
Apache Kafka does not charge you extra when Fivetran loads data into your destination.
Naming convention
The Kafka topic names in the destination are in the schema-table
format, where schema
and table
are the source schema and table names. When we assign the topic names, we always ensure we follow Fivetran's standard naming conventions.
Destination data storage
Events in Kafka are immutable, meaning they cannot be modified or deleted. Consequently, we append all operations (upsert/update/delete) as new records in Kafka.
Example:
Consider the following initial records in your Kafka destination:
ID | Column2 | Column3 | _fivetran_op_type |
---|---|---|---|
1 | a | b | 0 |
2 | x | y | 0 |
3 | p | q | 0 |
Assume the following changes occur in your source:
- A new row with ID = 4 is inserted.
- Column2 of the record with ID = 2 is updated to 'z'.
- The row with ID = 3 is deleted.
After we perform a sync, the records in the Kafka destination appear as follows:
ID | Column2 | Column3 | _fivetran_op_type | _fivetran_updated_columns |
---|---|---|---|---|
1 | a | b | 0 | null |
2 | x | y | 0 | null |
3 | p | q | 0 | null |
4 | k | l | 0 | null |
2 | z | null | 1 | column2 |
3 | p | q | 2 | null |
Here, _fivetran_op_type
and _fivetran_updated_columns
are Fivetran system columns which indicate the following:
_fivetran_op_type
actively indicates the following operations depending on its value: 0 = upsert, 1 = update, 2 = delete._fivetran_updated_columns
actively indicates the updated columns for update operations. If multiple columns are updated, they are represented by a string of all the updated column names separated by a semicolon (';').
Column data type changes
To change the column's data type, Fivetran updates the schema in the schema registry with the new data type. Sometimes, the compatibility type of your topic may not allow us to update the schema. In such cases, set the compatibility type of your topic to NONE and modify the downstream queries to consume data from the updated schema.