Apache Kafka
Apache Kafka is an open-source distributed streaming platform for building real-time data pipelines and stream processing applications.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | ||
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API | check |
Setup guide
Follow our step-by-step Apache Kafka setup guide to connect Apache Kafka with your destination using Fivetran connectors.
Schema information
Fivetran creates one table for each topic.
IMPORTANT: You can choose which topics to sync on the Schema tab in your Fivetran dashboard.
For each table it creates partition
, offset
, timestamp
, key
, and headers
columns, where partition
and offset
are the primary keys. The timestamp
column may contain either create_time
or log_append_time
as per the server configuration. The headers
column stores record metadata as JSON key-value pairs. Kafka records can have zero or more headers.
Kafka records can contain multiple headers with the same key. To prevent conflicts, we automatically deduplicate them by appending a numeric suffix to each repeated key. For example, if a Kafka record has:
- 2 headers with key
firstRepeatedKey
- 2 headers with key
secondRepeatedKey
- 1 header with key
anotherKey
The record will be synced with the following keys:
firstRepeatedKey_0
firstRepeatedKey_1
secondRepeatedKey_0
secondRepeatedKey_1
anotherKey
You can select to sync packed
or unpacked
messages. For the packed
messages, Fivetran syncs the message in value
column. The unpacked
messages must be in JSON
format. For all the first level JSON
elements, Fivetran creates a separate column. The column names are formed using value_<element_name>
format.
For the Avro message type, we sync the data as unpacked
messages by default. For the values, we sync each element with the column name format value_<element_name>
. If the key is also serialised using an Avro schema, we sync the elements in the key with the column name format key_<element_name>
. If the key is not serialised using an Avro schema, we sync it as a string.
Historical sync
After making the connection, Fivetran starts syncing all available messages from the Kafka topics. It goes to the earliest available offset for each partition of a topic and starts consuming the messages. It loads the messages into the destination. After the retention period the messages are deleted from the Kafka topics. The deleted messages won't be synced so if you happened to re-sync the connector it would only fetch the current available messages.