AWS MSK

AWS Managed Streaming for Kafka is a managed distributed streaming platform.

Features

Feature Name	Supported	Notes
Capture deletes
History mode
Custom data		All tables and fields
Data blocking		Column level
Column hashing
Re-sync		Connection level. If there is a retention period set for records, we will not be able to fetch records beyond the retention period.
Row filtering
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking
Authorization via API

Supported deployment models

We support the SaaS and Hybrid deployment models for the connector.

You must have an Enterprise or Business Critical plan to use the Hybrid Deployment model.

Setup guide

Follow our step-by-step AWS MSK setup guide to connect AWS MSK with your destination using Fivetran connectors.

Schema information

Fivetran creates one table for each topic.

You can choose which topics to sync on the Schema tab in your Fivetran dashboard.

For each table it creates partition, offset, timestamp, key, and headers columns, where partition and offset are the primary keys. The timestamp column may contain either create_time or log_append_time as per the server configuration. The headers column stores record metadata as JSON key-value pairs. Kafka records can have zero or more headers.

Kafka records can contain multiple headers with the same key. To prevent conflicts, we automatically deduplicate them by appending a numeric suffix to each repeated key. For example, if a Kafka record has:

2 headers with key firstRepeatedKey
2 headers with key secondRepeatedKey
1 header with key anotherKey

The record will be synced with the following keys:

firstRepeatedKey_0
firstRepeatedKey_1
secondRepeatedKey_0
secondRepeatedKey_1
anotherKey

You can select to sync packed or unpacked messages. For the packed messages, Fivetran syncs the message in the value column. The unpacked messages must be in JSON format. For all the first level JSON elements, Fivetran creates a separate column. The column names are formed using value_<element_name> format.

For the Avro message type, we sync the data as unpacked messages by default. For the values, we sync each element with the column name format value_<element_name>. If the key is also serialised using an Avro schema, we sync the elements in the key with the column name format key_<element_name>. If the key is not serialised using an Avro schema, we sync it as a string.

Historical sync

After establishing the connection with AWS MSK, Fivetran starts syncing all available messages from the Kafka topics. It goes to the earliest available offset for each partition of a topic and starts consuming the messages. It loads the messages into the destination. After the retention period, the messages are deleted from the Kafka topics. The deleted messages won't be synced, so if you re-sync your connection, it will only fetch the currently available messages.