Apache Kafka

Fivetran supports Apache Kafka as a destination.

Apache Kafka is an open-source distributed streaming platform for building real-time data pipelines. The Apache Kafka platform is based on a persistent, append-only, and publish-subscribe log system that captures and moves real-time data and events.

Supported implementations

Fivetran supports connecting with the following Kafka implementations:

Supported deployment models

We support the SaaS Deployment model for all Kafka implementations.

Type transformation mapping

The data types in your Apache Kafka destination follow Fivetran's standard data type storage.

We use the following data type conversions:

Fivetran Data Type	Destination Data Type	Notes
BOOLEAN	BOOLEAN
INT	INT
LONG	LONG
BIGDECIMAL	DECIMAL or DOUBLE	If a column has no precision or scale defined, we convert its data type to DOUBLE. Otherwise, we convert it to DECIMAL.
FLOAT	FLOAT
DOUBLE	DOUBLE
LOCALDATE	DATE
LOCALDATETIME	TIMESTAMP-MILLIS
INSTANT	TIMESTAMP-MILLIS
STRING	STRING
XML	STRING
JSON	STRING
BINARY	STRING

Setup guide

Follow our step-by-step setup guides for specific instructions on how to set up Apache Kafka as a destination:

Data load costs

Apache Kafka does not charge you extra when Fivetran loads data into your destination.

Naming convention

The Kafka topic names in the destination are in the schema-table format, where schema and table are the source schema and table names. When we assign the topic names, we always ensure we follow Fivetran's standard naming conventions.

Destination data storage

Events in Kafka are immutable, meaning they cannot be modified or deleted. Consequently, we append all operations (upsert/update/delete) as new records in Kafka.

Example:

Consider the following initial records in your Kafka destination:

ID	Column2	Column3
1	a	b
2	x	y
3	p	q

Assume the following changes occur in your source:

A new row with ID = 4 is inserted.
Column2 of the record with ID = 2 is updated to 'z'.
The row with ID = 3 is deleted.

After we perform a sync, the records in the Kafka destination appear as follows:

ID	Column2	Column3	_fivetran_op_type	_fivetran_updated_columns
1	a	b	0	null
2	x	y	0	null
3	p	q	0	null
4	k	l	0	null
2	z	null	1	column2
3	p	q	2	null

Here, _fivetran_op_type and _fivetran_updated_columns are Fivetran system columns which indicate the following:

_fivetran_op_type actively indicates the following operations depending on its value: 0 = upsert, 1 = update, 2 = delete.
_fivetran_updated_columns actively indicates the updated columns for update operations. If multiple columns are updated, they are represented by a string of all the updated column names separated by a semicolon (';').

We sync each row of data into Kafka individually rather than in a batch. If there are 100 incoming rows and we successfully sync the first 99 rows but fail on the 100th row, we consider the entire sync a failure. In the next sync, we again start syncing all 100 rows, including the 99 that were already synced. This can result in duplicate records in your Kafka cluster, potentially increasing storage usage and costs.

Column data type changes

To change the column's data type, Fivetran updates the schema in the schema registry with the new data type. Sometimes, the compatibility type of your topic may not allow us to update the schema. In such cases, set the compatibility type of your topic to NONE and modify the downstream queries to consume data from the updated schema.

Syncing data into single topic

Fivetran offers the option to sync all your source data into a single Kafka topic. When you enable this option, Fivetran consolidates data from all source tables and delivers it to a single topic that is named after the source schema.

To help identify the origin of each message within the topic, Fivetran adds the following columns to every message in your destination:

_fivetran_table: Contains the name of the source table the message originated from
_fivetran_pkey: Contains a list of the primary keys for the source table

To enable this option for your destination, set the Sync Data Into Single Topic toggle in the destination setup form to ON.