Staging for Redshift
While running Burst Integrate or Bulk Refresh, Fivetran HVR can put data into a temporary directory (‘staging file') before loading data into a target database.
For best performance, HVR performs Burst Integrate and Bulk Refresh into Redshift using staging files. HVR implements the file staging in Redshift as follows:
- HVR first writes data into the staging directory on Amazon S3.
- The cURL library interface is used by HVR for staging data on Amazon S3.
- The staging files (inside the Staging directory) are written in CSV format on Amazon S3.
- HVR then uses the Redshift SQL command
copy...from...s3
to ingest data into the Redshift target tables from the staging directory.
Staging on AWS
HVR must be configured to stage the data on Amazon S3 before loading it into Redshift. For staging the data on Amazon S3 and perform Burst Integrate and Bulk Refresh into Redshift, the following are required:
An AWS S3 location (bucket) to store temporary data to be loaded into Redshift. For more information about creating and configuring an S3 bucket, refer to the AWS Documentation.
An AWS user with AmazonS3FullAccess permission policy to access the S3 bucket. Alternatively, an AWS user with minimal set of permission can also be used:
Click here for minimal set of permissions.Sample JSON with a user role permission policy for S3 locationSample JSONSample JSON
- s3:GetBucketLocation
- s3:ListBucket
- s3:ListBucketMultipartUploads
- s3:AbortMultipartUpload
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
Sample JSON with a user role permission policy for S3 location
{ "Statement": [ { "Sid": <identifier>, "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<account_id>:<user>/<username>", }, "Action": [ "s3:GetObject", "s3:GetObjectVersion", "s3:PutObject", "s3:DeleteObject", "s3:DeleteObjectVersion", "s3:AbortMultipartUpload" ], "Resource": "arn:aws:s3:::<bucket_name>/*" }, { "Sid": <identifier>, "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<account_id>:<user>/<username>" }, "Action": [ "s3:ListBucket", "s3:GetBucketLocation", "s3:ListBucketMultipartUploads" ], "Resource": "arn:aws:s3:::<bucket_name>" } ] }
For minimal permission, HVR also supports the AWS temporary security credentials in IAM. There are two ways to request for the AWS Security Token Service (STS) temporary credentials:
Using a combination of AWS STS Role ARN, AWS Access Key Id, and AWS Secret Access Key
Sample JSON
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<account_id>:<user>/<username>" }, "Action": "sts:AssumeRole" } ] }
Using a combination of AWS STS Role ARN and AWS IAM Role (a role that has access to an EC2 machine)
Sample JSON
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<account_id>:<user>/<username>", "arn:aws:iam::<account_id>:<role>/<username>" ] }, "Action": "sts:AssumeRole" } ] }
For more information on the Amazon S3 permissions policy, refer to the AWS S3 documentation.
For more information, refer to the following AWS documentation:
Define the following location properties (while creating a location or by editing the existing location's source and target properties) for the Redshift location:
- The Location Property equivalent to the UI field is shown inside (bracket) below.
- If the managed secrets feature is enabled, option USE TOKEN INSTEAD is displayed in the fields designated for entering secrets.
STORAGE (File_Scheme): protocol for connecting HVR to S3.
Select either of the following options:
- Amazon S3 (HTTPS)
- Amazon S3 (HTTP)
BUCKET (S3_Bucket): Name or IP address of the Amazon S3 bucket.
Credentials or Instance Profile: AWS security credentials. For more information about getting your AWS credentials or Instance Profile Role, refer to the AWS Documentation.
- If Credentials is selected, the location property KEY ID (AWS_Access_Key_Id) and SECRET KEY (AWS_Secret_Access_Key) must be supplied.
KEY ID: Access key ID of IAM user for connecting HVR to Amazon S3.
SECRET KEY: Secret access key of IAM user for connecting HVR to Amazon S3.
- If Instance Profile is selected, the location property IAM ROLE (AWS_IAM_Role) must be supplied.
- IAM ROLE: AWS IAM role name for connecting HVR to Amazon S3.
- If Credentials is selected, the location property KEY ID (AWS_Access_Key_Id) and SECRET KEY (AWS_Secret_Access_Key) must be supplied.
Optionally, S3 Encryption can be enabled. For more information about S3 data encryption, refer to the AWS Documentation.
The available options are:
Client-Side with Master Key: Enables client-side encryption using a master symmetric key for AES. The MASTER SYMMETRIC KEY (S3_Encryption_Master_Symmetric_Key) must be supplied for this encryption method.
Server-Side (S3_Encryption_SSE): Enables server-side encryption with Amazon S3 managed keys.
Server-Side with KMS (S3_Encryption_SSE_KMS): Enables server-side encryption with customer master keys (CMKs) stored in AWS key management service (KMS). If CUSTOMER MASTER KEY ID (S3_Encryption_KMS_Customer_Master_Key_Id) is not specified, a KMS managed CMK is used.
STAGING DIRECTORY (Staging_Directory): location where HVR will create the temporary staging files inside the S3 bucket (e.g. /my_staging_dir).
STAGING DIRECTORY AS VISIBLE FROM THE DATABASE (Staging_Directory_Database): directory path from where Redshift will access the temporary staging files.