Schema Management
Fivetran automatically builds and updates destination tables as new data arrives, so explicitly declaring a schema in your Connector SDK code is optional; connectors work out of the box without it.
We recommend defining primary keys for each table in the schema so Fivetran can de-duplicate and apply updates to specific rows. If you don’t declare a primary key for a table, Fivetran generates an internal unique surrogate primary key. Choose the approach that best fits your connector’s row identity and update behavior, as well as your downstream analytics requirements.
You may also declare a schema to control data types for downstream analytics, or to ensure columns appear in your destination immediately, even if they are sparsely populated. By default, Fivetran creates a column only when it receives a non-null value, which means that columns with sparse data take longer to appear because data must first arrive.
This guide explains what schema management is and how to implement it in your connector.
Understanding schema
In the context of the Connector SDK, a schema is a combination of the table structure you declare directly or retrieve dynamically in your Connector SDK code, and any tables and columns your connector actually sends at runtime. The table structure includes:
- Tables: Collection of rows, such as
USERSorORDERS. - Columns and their data types: Fields inside each row, like
id,created_at, oremail, along with their data types, such as STRING, INTEGER, or TIMESTAMP. - Keys: Columns that identify and relate records.
The term schema also refers to the destination namespace that holds your tables (for example, a BigQuery dataset or Snowflake schema). A Connector SDK connection writes data to a single logical destination schema, named after the connection.
Schema design
Think of schema design as a contract between your connector and Fivetran about how data should land in the destination based on your analytics requirements. In your Connector SDK code, you can define this contract in schema() by declaring primary keys for deterministic row identity, and only the column types that require fixed typing or immediate visibility. Implement the contract by structuring your rows to match the declared tables and columns, and sending the rows to Fivetran.
Before you design your schema, fetch and inspect the source data or metadata, using tools such as Postman for API responses or database introspection. Identify tables, primary keys, fields that are often empty, and nested types to inform your schema decisions.
You don’t need to list every column in your schema declaration — Fivetran can infer columns and types from the data you send. Schema inference is especially valuable for sources with dynamic or evolving schemas, and is a great fit for prototypes or connectors syncing unknown or rapidly growing data models.
In your schema design, consider table relationships as well. Foreign keys make joins and downstream lineage easier, but the Connector SDK doesn't support declaring or enforcing them. To simplify downstream joins, include relationship columns, such as user_id, in child tables and document how tables relate.
If your source provides metadata, such as an endpoint listing fields, database metadata, or third-party schema-discovery methods, you can generate your schema dynamically at runtime. This lets your connector handle fully dynamic schemas while still specifying table names and primary keys.
If the names and fields your connector sends don't match your schema, such as different table names, new fields, or conflicting data types, Fivetran treats this as intentional and creates inferred columns or new tables. This can result in additional tables or columns, which are often mostly empty.
Implementing schema
Implement your schema design in the connector.py code:
In
schema(), declare tables, their primary keys, and the column data types you need precise control over for downstream analytics.If your data extraction uses a metadata endpoint, database introspection, or other schema discovery methods, you can fetch the schema dynamically.
In
update(), shape records to match your schema and send them to Fivetran using the supported operations.
Here's an example of the schema() function that sets the primary key for the USERS table and specifies data types for four of its columns:
def schema(configuration: dict):
return [{
"table": "users",
"primary_key": ["id"],
"columns": {
"id": "STRING",
"email": "STRING",
"created_at": "UTC_DATETIME",
"custom_attribute": "STRING"
}
}]
Define the columns you need to control, for example, primary keys, columns where type matters, or sparse columns. Other columns can be inferred from the rows you send.
Here's an example of a matching upsert() operation that sends a row to the same table. Notice that the row includes two columns, city and name, that aren't declared in schema(). Fivetran infers these columns and their data types from the row data and adds them to the destination table automatically.
def update(configuration: dict, state: dict):
op.upsert(
table="users",
data={
"id": "1",
"email": "alice@example.com",
"created_at": "2025-01-10T12:34:56Z",
"custom_attribute": "pro",
"city": "Austin",
"name": "Alice"
}
)
Defining primary keys in your schema
Primary keys are fields or a combination of fields that uniquely identify each row in a table.
If a single field isn't sufficient, for example, when a row is uniquely defined by both a user and a timestamp, you can use a composite primary key that combines multiple fields into one unique identifier.
Here's an example of a composite primary key declaration:
"primary_key": ["user_id", "event_timestamp"]
If you don't declare a primary key for a table, Fivetran generates a surrogate primary key named _fivetran_id, which is a unique internal ID we create by hashing the full record. Changing a column name changes the hash, requiring a re-import of data.
Defining data types
Fivetran recommends reviewing your source fields for inconsistent data types. Explicitly declaring data types in your Connector SDK schema ensures consistent values reach your destination. If a column can contain inconsistent values, for example, sometimes a string, sometimes a number, and you want to avoid passing that variation downstream, declare its data type in schema().
The Connector SDK supports specific data types, including STRING, UTC_DATETIME, LONG, and DECIMAL. For more information, see the full list of supported types.
The specified types example demonstrates how to declare column data types in schema().
Once you declare a column data type, make sure the rows you send to Fivetran conform to that type. If they don't, the sync fails and Fivetran returns an error.
Tips
- Begin by creating a fully working connector with a small schema, such as one to three tables. Building in smaller iterations lets you deliver value early and makes testing your connector easier.
- Avoid using reserved system column names.
Validating schema
The tables and columns that appear in your destination are determined by both your schema declaration and how you send data in your update() function. If you send data to a table name not defined in your schema, Fivetran creates a new table in your destination. Similarly, if you make a typo in a table or column name when sending operations, your connector doesn't break, but the data may end up in an unexpected table or column.
Because of this, it’s important to test your schema and data loading early and often in development. After writing your initial logic, run a small sync using fivetran debug, then inspect the debug destination (files/warehouse.db) to verify that the tables and columns match your expectations. If the results aren't what you intended, review your schema() definition, and the table and column names used in the operations inside update().
Early validation ensures your data structure is correct and helps catch issues before they reach production.
Handling an evolving schema
Schemas often evolve. For example, if you have a working connector, your source schema may grow — new tables and columns might be added, data types could change, or tables and columns may be removed or renamed. When your connector passes these changes through to Fivetran, we automatically manage these changes based on the destination. The following table describes what Fivetran does in each scenario when we first receive data in a new schema.
You can also design your connector code to respond differently to schema changes — for example, by ignoring new tables or columns, logging a message, or throwing a runtime error to stop the sync. If your analytics requirements call for it, you can also selectively control what data you send to Fivetran.
| Scenario | What Fivetran does |
|---|---|
| Source adds a new table | If your connector fetches tables dynamically, Fivetran can detect the new table automatically. Otherwise, update your code to include the new table. Once your code sends data from the new table, Fivetran creates it in the destination and infers the column types. |
| Source deletes a table | When your connector stops sending data for the table, Fivetran keeps the table in your destination unchanged. The table remains in the state it was in at the last sync. |
| Source renames a table | When Fivetran receives data under the new table name, it treats the rename as two operations: the old table is left in your destination unchanged (no new data syncs to it), and the new table is created with the new name. Fivetran does not migrate data from the original table automatically. |
| Source sends a new column | When your code sends the new column, Fivetran creates the new column and infers the column type from the values passed to us. |
| Source changes a column's data type | When Fivetran receives data for the column with a changed type, widening changes (for example, INTEGER to LONG) are applied automatically. Narrowing changes (for example, LONG to INTEGER) are not applied; your destination retains the wider type. If values don't match the type declared in schema(), the sync fails. |
| Source removes a column | When your connector stops sending data for the column, Fivetran keeps the column in the table. New rows have NULL for the removed column. |
| Source renames a column | When your connector sends data using the new column name, Fivetran treats it as a new column and populates data to it. The old column remains in the table, and new rows have NULL for it. Fivetran does not migrate data from the original column automatically. |
Primary key changes can cause data integrity issues such as duplicate or missing rows, and require your intervention. To handle a primary key change, do the following:
- Update your code and test the change locally: modify
schema()and verify the operations inupdate(). - Back up the affected table in your destination, then drop it.
- Redeploy your connector and re-sync the connection or, at minimum, the affected table.
Fivetran recreates the table with the updated primary key structure.
Schema design examples
You can find practical schema design patterns directly in the official Fivetran Connector SDK repository.
- Static schema: The template connector is the best starting point for beginners. It shows a static schema defined in
schema(), with explicit table, primary key, and column declarations. Use this pattern when your source schema is stable and known in advance. - Parent and child tables: The Data handling patterns example illustrates handling nested or repeating data. The parent table holds the main entity, while child tables represent repeated fields. Child tables reference the parent using the parent's primary key such as
company_id. Use this approach for sources with lists or breakdowns. Other examples that demonstrate this pattern include Toast source and Multiple tables with cursors. - Dynamic schema: Database connectors (for example, Hive or JDBC) discover tables and columns at runtime. Schema is generated dynamically, making this pattern ideal for sources with account-specific or user-defined schemas. Examples to explore include Schema from database, Tracking tables, and Yugabyte.
- Minimal or inferred schema: Some simple API or file connectors rely on schema inference, omitting or minimizing
schema(). This is useful for prototypes, but beware of duplicates and unstable columns if the record shape changes. Examples to explore include Unspecified types and SQL Server.
These open-source examples are the fastest way to build intuition for how schema decisions affect real data in the destination. Experiment with small samples, review the resulting tables, and adjust your schema to fit your needs.