Redshift

Amazon Redshift is a fast, fully managed, scalable, and cost-effective cloud data warehouse that enables you to analyze exabytes of data and run complex analytical queries for business insights. With Redshift, you can run and scale analytics on your data within seconds without having to manage your data warehouse infrastructure. Fivetran supports both Redshift provisioned and Redshift Serverless data warehouses as destinations for lower time to value from your data.

Supported deployment models

We support the SaaS and Hybrid deployment models for the destination.

You must have an Enterprise or Business Critical plan to use the Hybrid Deployment model.

Setup guide

Follow our step-by-step Redshift setup guide to connect your Redshift data warehouse with Fivetran.

Type transformation mapping

The data types in your Redshift data warehouse follow Fivetran's standard data type storage.

We use the following data type conversions:

Fivetran Data Type	Destination Data Type	Notes
BOOLEAN	BOOLEAN
SHORT	SMALLINT
INT	INTEGER
LONG	BIGINT
BIGDECIMAL	DECIMAL or DOUBLE PRECISION	We convert BIGDECIMAL to DOUBLE PRECISION only if: the column's precision exceeds `38` the column's scale exceeds `37` the column has no precision or scale defined
FLOAT	REAL
DOUBLE	DOUBLEPRECISION
LOCALDATE	DATE
LOCALDATETIME	TIMESTAMP
INSTANT	TIMESTAMP_TZ	For source connections created before August 30, 2021, we convert INSTANT to TIMESTAMP.
STRING	VARCHAR or TEXT	VARCHAR if `bytelength`is present, else TEXT.
JSON	SUPER	For source connections created before June 12, 2025, we convert JSON to VARCHAR.
BINARY	VARBYTE	For source connections of accounts created before September 1, 2022, we convert BINARY to VARCHAR as the Base64 encoding of the raw bytes.

During UPDATE operations, if a SUPER column exceeds the 65535-byte limit, Fivetran inserts an error message in the column to indicate the issue.

Parse JSON data

Depending on whether your destination column is of type VARCHAR or SUPER, you can parse your JSON data using different queries.

For example, consider a destination table named TEST.JSON_PARSING_EXAMPLE (id INT64, json STRING) with the following nested JSON structure:

{
    "id":1,
     "name": {
     "first_name": "john",
     "last_name": "smith",
     "full_name": "john_smith"
     }, 
     "addresses":[
       {
         "name": "home",
         "city": "oakland",
         "state": "california",
         "country": "usa"
       }, 
       {
         "name": "work",
         "city": "denver",
         "state": "colorado",
         "country": "usa"
       }  
     ]
 }

To parse JSON data from VARCHAR columns using Redshift JSON functions, do the following:

Run the following commands to query the data:

   select 
        json_extract_path_text (json, 'id', true) as json_id,
        json_extract_path_text (json, 'name', 'first_name', true) as first_name
   from test.json_parsing_example;

json_id	first_name
1	john

Run the following commands to flatten the data:

   with json_data as (
     select 
       json_parse(json) as json_col
     from test.json_parsing_example
   ) select
     json_col.id, 
     json_col.name.first_name::varchar,
     element.name::varchar as address_name,
     element.city::varchar as address_city,
     element.state::varchar as address_state,
     element.country::varchar as address_country
   from json_data as example, example.json_col.addresses AS element AT index

json_id	first_name	address_name	address_city	address_state	address_country
1	john	home	oakland	california	usa
1	john	work	denver	colorado	usa

To parse your JSON data from SUPER columns, do the following:

Run the following commands to query the data:

   select 
        json.id as json_id,
        json.name.first_name::varchar as first_name
   from test.json_parsing_example;

json_id	first_name
1	john

Run the following commands to flatten the data:

 select
   json.id, 
   json.name.first_name::varchar,
   element.name::varchar as address_name,
   element.city::varchar as address_city,
   element.state::varchar as address_state,
   element.country::varchar as address_country
 from test.json_parsing_example as example, example.json.addresses AS element AT index

json_id	first_name	address_name	address_city	address_state	address_country
1	john	home	oakland	california	usa
1	john	work	denver	colorado	usa

Optimize Redshift

Add sort and distribution keys

Fivetran automatically infers the primary, foreign, sort, and distribution (dist) keys whenever possible. The data from cloud application connections should automatically have all four keys. For database connections, we add the primary and foreign keys (if they are present in the source), but not the dist and sort keys.

You can add Redshift's sort key and dist key to optimize query performance and improve sync speeds. For example, if you set the primary key as a sort key, it will result in the following:

Faster delete operations of the common records
Improved sync speeds during upsert operations
Quicker analysis queries

The following steps apply to all data sources. In the example below, we'll use the EVENT table from the Snowplow connection. Suppose you want to apply sort and dist keys to the snowplow.event table. You have to create a new table that's the same as the original table, but with sort and dist keys:

Pause the connection (Snowplow in our example) from the Fivetran dashboard. On the connection dashboard, set the connection status toggle to PAUSED, ensuring that a sync isn't taking place.

Log in to your database and get the schema of the original table using the \d snowplow.event command. This command will render the schema of the table. For example,

Column	Type
event_id	TEXT
app_id	TEXT
br_colordepth	INTEGER
br_cookies	BOOLEAN
br_features_director	BOOLEAN
br_features_flash	BOOLEAN
br_features_gears	BOOLEAN
br_features_java	BOOLEAN
br_features_pdf	BOOLEAN
br_features_quicktime	BOOLEAN
br_features_realplayer	BOOLEAN

Write a create table command to re-create the table. Add the PRIMARY KEY and FOREIGN KEY constraints:

create table snowplow.event_copy (
  app_id                   CHARACTER VARYING(256) Primary Key,
  br_colordepth            INTEGER references another_table(id),
  br_cookies               BOOLEAN,
   br_features_director     BOOLEAN,
   br_features_flash        BOOLEAN,
   br_features_gears        BOOLEAN,
   br_features_java         BOOLEAN,
   br_features_pdf          BOOLEAN,
   br_features_quicktime    BOOLEAN,
   br_features_realplayer   BOOLEAN,
);

Get the compression encodings using the analyze compression command. For example, analyze compression snowplow.events;. The command will run for a little while and result in the following:

Table	Column	Encoding
event	app_id	lzo
event	br_colordepth	lzo
event	br_cookies	raw
event	br_features_director	raw
event	br_features_flash	raw
event	br_features_gears	raw
event	br_features_java	raw
event	br_features_pdf	raw
event	br_features_quicktime	raw
event	br_features_realplayer	raw

Update the create table command with the compression encodings and the sort and dist keys. Execute the create table command:

create table snowplow.event_copy (
event_id                 CHARACTER VARYING(256) primary key distkey encode lzo, -- we join custom contexts on this, so it should be the dist key
dvce_sent_tstamp         TIMESTAMP WITHOUT TIME ZONE sortkey encode lzo, -- we often select on this, so it should be the sort key
dvce_created_tstamp      TIMESTAMP WITHOUT TIME ZONE encode lzo,

app_id                   CHARACTER VARYING(256) encode lzo Primary Key,
br_colordepth            INTEGER encode lzo references another_table(id),
br_cookies               BOOLEAN,
br_features_director     BOOLEAN,
br_features_flash        BOOLEAN,
br_features_gears        BOOLEAN,
br_features_java         BOOLEAN,
br_features_pdf          BOOLEAN,
br_features_quicktime    BOOLEAN,
br_features_realplayer   BOOLEAN,
);

Copy the data from master table to the copy table:

insert into snowplow.event_copy (select * from snowplow.event);
drop table snowplow.event;
alter table snowplow.event_copy rename to event;
commit transaction;

Verify that the sort and dist keys are in place:

SET search_path to 'snowplow';
SELECT "column", type, encoding, distkey, sortkey FROM 
pg_table_def 
WHERE tablename = 'event';

column	type	encoding	distkey
app_id	TEXT	lzo	f
br_colordepth	INTEGER	lzo	f
br_cookies	BOOLEAN	none	f
br_features_director	BOOLEAN	none	f
br_features_flash	BOOLEAN	none	f
br_features_gears	BOOLEAN	none	f
br_features_java	BOOLEAN	none	f
br_features_pdf	BOOLEAN	none	f
br_features_quicktime	BOOLEAN	none	f
br_features_realplayer	BOOLEAN	none	f

Shrink large schemas

You can check if you have a large Redshift schema using the following query:

SELECT
  schema,
  sum(size) AS size_mb,
  sum(tbl_rows) AS rows,
  sum(size) / sum(tbl_rows) AS avg_row_size_mb
FROM svv_table_info
GROUP BY 1
ORDER BY 2 DESC

If you used the VACUUM command but still observe large (size_mb > 10,000) and inefficient (avg_row_size_mb > 0.010) schemas, you should fix these schemas by making a deep copy of every table in the schema.

The following process is time-consuming, but it should significantly reduce the size of your tables. Once you make a deep copy, your schema sizes will not exceed the standard levels. As an added benefit, this process ensures that all of your columns have the best compression encoding, making your queries execute faster.

Use Amazon's columnar compression utility to generate SQL that will perform the deep copy. The exact command is:

./analyze-schema-compression.py 
--db [your database] --db-user [your user] --db-host [your host] --db-port [your port, usually 5439] --analyze-schema [schema to compress] --analyze-table [table to compress] --comprows 100000 --slot-count 1 --force true --drop-old-data true --output-file deep_copy.sql

The command will generate a file named deep\_copy.sql that contains a bunch of SQL statements.

Run the statements in deep_copy.sql to perform the deep copy and drop the old data.
The commands will take a long time to execute. On completion, any views that reference these tables are dropped. You have to re-create them.
The column compression script makes you the owner of all the tables that get compressed.
Assign ownership of the table back to Fivetran. For each table in the schema, run:
```
ALTER TABLE [schema].[table] OWNER TO 'fivetran'
```
If the Fivetran user is something other than 'fivetran', replace the default 'fivetran' value.

Connect as master or limited user

There are two ways to allow Fivetran to access your Redshift data warehouse:

Connect as a Master user: Fivetran will have access to your data at all times. The benefit of this is that it's effortless to set up - there's no additional configuration required. The con of connecting this way is that this user cannot be restricted. The implication of this is that you will not be able to apply workload management settings on that user. Workload management enables users to flexibly manage priorities within workloads so that short, fast-running queries won't get stuck in queues behind long-running queries.
Connect as a Limited User: Fivetran will adhere to the privileges that you set. To do this, you'll need to create a separate, specific Fivetran user in your Redshift data warehouse. This user must have CREATE permissions. The benefit of this is that Fivetran will only have access to the schemas that we deliver to in Redshift, and it will allow us to read and write to those schemas exclusively.

See our setup instructions for more information.

Migrate destinations

To migrate your Redshift cluster and database that is configured with Fivetran to another database on a different cluster, you must first decide whether you want to start from scratch or migrate the data that Fivetran has uploaded on the old cluster.

If the data sources are fast to re-sync, and the amount of data in the old database is relatively small, then we suggest starting over and re-syncing everything:

Pause all your connections from the Fivetran dashboard. On the connection dashboard, set the connection status toggle to PAUSED.
Update the destination connection settings to specify the new cluster details. In the destination dashboard, click Actions and then click Edit connection details. Enter the new database name and provide the authentication details.
Enable your connections. On the connection dashboard, set the connection status toggle to ENABLED.
For all your connections, go to the Details tab and click Resync All Historical Data.

If you have large amounts of data or your connections are highly API limited, the re-sync process will take a lot of time. In such cases, we suggest that you migrate the data from the old cluster to the new cluster using Redshift backups:

Pause all your connections from the Fivetran dashboard. On the connection dashboard, set the connection status toggle to PAUSED.
Restore your new Redshift cluster from a backup of your old Redshift cluster. This is critical because Fivetran needs to have the same metadata available to continue syncing the data from the last sync point and not have to restart the entire sync again.
Update the destination connection settings to specify the new cluster details. In the destination dashboard, click Actions and then click Edit connection details. Enter the new database name and provide the authentication details.
Enable your connections. On the connection dashboard, set the connection status toggle to ENABLED.

Data load costs

Redshift does not charge you extra when Fivetran loads data into your data warehouse.

Table limitations

Amazon Redshift limits the maximum number of tables that you can create. If you exceed the maximum limit, Fivetran can't create any further tables in the destination. As a result, the integration will fail, causing errors or data loss. See Redshift's documentation for more information about the maximum table limits.

System column name management

Redshift does not allow to create user-defined columns with the following names:

oid
tableoid
xmin
cmin
xmax
cmax
ctid

To avoid naming conflicts with the Redshift system rules, we prefix these column names with # before writing them to the destination.

If your database connection is configured to preserve source naming, we do not add # to the reserved column names and skip such columns from the sync.