Azure Blob Storage Setup Guide
Follow our setup guide to connect Azure Blob Storage to Fivetran.
Prerequisites
To connect Azure Blob Storage to Fivetran, you need:
- An Azure Blob Storage container holding files with supported file types and encodings
- The ability to grant Fivetran the ability to read and list files from this container
Setup instructions
Select connection method
IMPORTANT: The Connection Method option is only available for Business Critical accounts.
First decide whether to connect Fivetran to your Azure Blob Storage container directly, using an SSH tunnel, or using Azure Private Link.
Connect directly
Fivetran connects directly to your Azure Blob Storage container. This is the simplest connection method.
If you have a firewall enabled, create a firewall rule to allow access to Fivetran's IPs.
If you have a firewall enabled and your Fivetran instance is configured to run in the same region as your Azure Storage Account, you need to configure virtual network rules and add Fivetran's internal virtual network subnets to the list of allowed virtual networks. Learn more in Microsoft's Azure Blob Storage documentation. Reach out to support in order to retrieve the list of region-specific fully qualified subnet IDs.
Connect using Private Link
IMPORTANT: You must have a Business Critical plan to use Azure Private Link.
Azure Private Link allows Virtual Networks (VNets) and Azure-hosted or on-premises services to communicate with one another without exposing traffic to the public internet. Learn more in Microsoft's Azure Private Link documentation.
Follow our Azure Private Link setup guide to configure Private Link for your storage container.
Connect using SSH (TLS optional)
IMPORTANT: Do not perform this step if you want to use the Hybrid Deployment model for your data pipeline.
Fivetran connects to a separate server in your network that provides an SSH tunnel to your Azure Blob Storage container. You must connect through SSH if your container is in an inaccessible subnet on a virtual network.
To connect using SSH, create a firewall rule to allow access to your SSH tunnel server's IP address.
Before you proceed to the next step, you must follow our SSH connection instructions to give Fivetran access to your SSH tunnel. If you want Fivetran to tunnel SSH over TLS, follow Azure's TLS setup instructions to enforce a minimum TLS required version on your namespace.
Create shared access signature in Azure
IMPORTANT: You can re-use the Shared Access Signature (SAS) across multiple Fivetran connectors.
Open the Azure Portal.
Select your storage account and click Shared access signature.
Select Blob from the Allowed services options.
Select Container and Object from the Allowed resource types options.
Select Read and List from the Allowed permissions options.
Choose the appropriate start and expiry dates of your SAS.
IMPORTANT: When the SAS expires, you will have to update your Azure Blob Storage connector to resume syncing files.
(Optional) To enhance security, safelist Fivetran's IP address range under Allowed IP addresses. Azure only allows one IP range per SAS token. Skip this step if you have already configured Fivetran's internal virtual network subnets.
IMPORTANT: Use the IP range format to safelist the IP addresses, for example,
35.234.176.144-35.234.176.151
, because the CIDR format, for example,35.234.176.144/29
, is not supported in Azure Portal.Select HTTPS only from the Allowed protocols options. We recommend that you select HTTPS only to ensure the security of your files.
Click Generate SAS and connection string.
Make a note of the Connection string value. You need it to configure Fivetran.
Finish Fivetran configuration
In the connector setup form, enter your Destination schema name.
Enter your Destination table and your Container Name.
Enter the Connection String you found in Azure.
Choose your configuration options. Using these configuration options, you can select subsets of your folders, specific types of files, and more to sync only the files you need in your destination. In addition, setting up multiple connectors with different options allows you to slice and dice your data any way you'd like.
You can use the following configuration options:
Beta Primary Key used for file process and load - Use this option to let Fivetran know how you'd like to update the files in your destination. When you modify a previously synced file, the option you select determines if we should replace the rows in the destination table or append new rows to the table:
- If you select Upsert file using file name and line number, we will upsert your data using the surrogate primary keys
_file
and_line
. If a file has a unique name, we will sync the data for that file as new data. - If you select Append file using file modified time, we will upsert your files using the surrogate primary keys
_file
,_line
, and_modified
. You can track the complete history of a file or a set of files and your files contain a mix of old and new data or data that is updated periodically. - If you select Upsert file using custom primary key, you can keep the most recent version of every record and your files will have a combination of the old and new data or data that is updated periodically. You can choose the primary keys you want to use after you save and test. For more information, see our documentation.
NOTE: To ensure data integrity, we recommend that you don't change the primary keys once the initial sync is complete.
- If you select Upsert file using file name and line number, we will upsert your data using the surrogate primary keys
(Optional) Folder Path - Use the folder path to specify a portion of the container in which you'd like Fivetran to look for files. We examine files under the specified folder and all of its nested subfolders for files we can sync. If you don't provide a prefix, we'll look through the entire container for files to sync.
(Optional) File Pattern - Use a regular expression as the file pattern to decide whether or not to sync specific files. The pattern applies to everything under the prefix (folder path). If you're unsure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix.
For example, let's say that you have three folders -
2017
,2016
, anderrors
- under the prefixlogs
. Using the pattern\d\d\d\d/.*
, you can exclude all the files in theerrors
folder because:\d\d\d\d
only applies to the folders whose name consists of four consecutive digits, and.*
after/
applies to any files in these folders
TIP: You can learn to write your regex and test it out.
File Type - Use the file type to choose the parsing strategy for files without file extensions. If you save your files with improper extensions, you can force them to be synced as the selected file type.
If you select infer, we infer the type from a file's extension (.csv, .tsv, .json, .avro, or .log).
NOTE: If you have XML files, don't select infer. We sync XML files only when you select the file type as xml. For more information about the file size, see our documentation.
NOTE: If you have PGP encrypted files, do not select infer.
If you choose a file type, we interpret every file we examine as the file type you select, so make sure everything we sync has the same file type.
For example, if you have an automated CSV output system that saves files without a .csv extension, you can specify the type as csv, and we will sync them correctly as CSVs.
If you select xml, we load your XML data into the
_data
column without flattening it.
(Optional) JSON Delivery Mode - Available when JSON or JSONL is selected in File Type. Use this option to choose how Fivetran should handle your JSON data.
- If you select Packed, we load all your JSON data into the
_data
column without flattening it. - If you select Unpacked, we flatten one level of columns and infer their data types.
- If you select Packed, we load all your JSON data into the
Compression - Use the compression option to choose the compression strategy to decompress files without compression extensions. If your files are compressed but do not have extensions indicating the compression method, you can decompress them according to the selected compression algorithm.
If all of your compressed files are correctly marked with a matching compression extension (.bz2, .gz, .gzip, .tar, or .zip), you can select infer.
If you select uncompressed, we do not decompress the files and sync the uncompressed files.
If you choose a compression format, we decompress every file using the format you select.
For example, if you have an automated CSV output system that GZIPs files to save space but saves them without a .gzip extension, you can set this field to GZIP. We will decompress every file that we examine using GZIP.
Error Handling - Use the error handling option to choose how to handle errors in your files. If you know that your files contain some errors, you can choose to skip poorly formatted lines.
If you select skip, we ignore improperly formatted data within a file, allowing you to sync only valid data.
If you select fail, we fail the sync with an error on finding any improperly formatted data.
TIP: We recommend that you select fail unless you are sure that you have undesirable, malformed data.
You will receive a notification on your Fivetran dashboard if we encounter errors.
(Optional) To use the advanced configuration options, set the Enable Advanced Options toggle to ON.
You can use the following configuration options for specific use cases:
Modified File Merge - Use this option to let Fivetran know how to update files in the destination. When you modify a previously synced file, this setting tells us whether we should replace the rows in the destination table or append the new rows to the table:
upsert_file replaces records in the destination, using the filename and line number as the primary key.
append_file appends records.
(Optional) Archive Folder Pattern - Use a regular expression to filter and sync files from archived folders. We sync the files in compressed archives with filenames matching the specified pattern. If there are multiple files within the archive (TAR or ZIP) folders, you can use the archive folder pattern to filter file types.
For example, if you specify the archive folder pattern as
.*json
, we will sync only the files that end in a .json file extension from the archive folder.NOTE: This is only used to filter the files within the archived folder.
(Optional) Null Sequence - Specify the value indicating null if your CSVs use a special value indicating null.
Only use this field if you are sure your CSVs have a null sequence. CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as
\N
to represent null.(Optional) Delimiter - Specify the delimiter. The delimiter is a character used in files to separate one field from the next. Fivetran tries to infer the delimiter, but in some cases, this is impossible. If your files sync with the wrong number of columns or use a unique delimiter, consider setting this value. For example, if you have tab-delimited files, you must enter
\t
, and if you have pipe-delimited files, enter|
.If you leave this field blank, we infer the delimiter for each file. You can store files of many different types of delimiters in the same folder with no problems. For more information on the delimiter inference, see our documentation.
If you specify a delimiter, we parse all the CSV, TSV, and TXT files in your folder path with this delimiter.
NOTE: You can also specify a multi-character delimiter in this field. A custom multi-character delimiter (excluding "\t" and "\s") should be mentioned only if the source contains only csv files, else it might lead to data integrity issues for other files. The length of custom multi-character delimiter should not exceed 15 characters.
(Optional) Escape Character - Set the escape character if your CSV generator follows non-standard rules for escaping quotation marks.
Only use this field if you are sure your CSVs have a different escape character. CSVs have a special rule for escaping quotation marks compared to other characters; they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use different characters like backslash for escaping.
(Optional) Skip Header Lines - Use this option to skip over fixed-length headers at the beginning of your CSV files. Set the toggle to ON, and then in the Number of skipped header lines field, specify the number of header lines you want to skip.
Some CSV-generating programs include additional header lines or empty lines at the top of the file. The header consists of a few lines that do not match the format of the rest of the rows in the file. These header rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
(Optional) Skip footer Lines - Use this option to skip over fixed-length footers at the end of your CSV files. Set the toggle to ON, and then in the Number of skipped footer lines field, specify the number of footer lines you want to skip.
Some CSV-generating programs include a footer at the bottom of the file. The footer consists of a few lines that do not match the format of the rest of the rows in the file. These footer rows can cause undesired behavior because we attempt to parse them as if they were records in your CSV.
(Optional) Headerless Files - Set the toggle to ON if your CSV-generating software doesn't provide a header line for the documents. Fivetran can generate the generic column names and sync data rows with them.
Some CSV-generating programs do not include column name headers for the files; they only contain data rows. When you set the toggle to ON, we generate generic column names following the convention of
column_0
,column_1
, ...column_n
to map the rows.(Optional) Line Separator - Specify the custom line separator for your CSV files. The line separator is used in files to separate one row from the next.
If you leave this field blank, we use the new line character
\n
as the line separator by default.If you specify a line separator, we parse all the CSV files in your folder path with this line separator.
(Optional) PGP Encryption Options - Use this option to sync PGP encrypted files. Set the toggle to ON and specify the following:
- PGP Private Key - Upload the PGP secret key as an attachment.
- (Optional) Passphrase - Enter the passphrase you used to generate the key.
- (Optional) Signer's Public Key - Upload the signer's public key as an attachment. This key is used for verifying the files.
NOTE: For PGP decryption processes, we strictly comply with the RFC4880 standard. We support syncing only base64 encoded files.
Connection Method: If you're on a Business Critical plan, choose how Fivetran should connect to your Azure storage container. You can choose to:
- Connect directly
- Connect via Private Link
- (Not applicable to Hybrid Deployment) Connect via SSH tunnel
- If you chose Connect via SSH Tunnel, do the following:
- Enter the IP address of storage container (or the domain address )
- Enter the IP address of host tunnel machine
- Enter the username of account in host tunnel machine
- Copy the Public Key from the connector setup form and paste it into the
.ssh/authorized_keys
file inside the home folder on the tunnel machine.
- If you chose Connect via SSH Tunnel, do the following:
NOTE: For connectors configured for Hybrid Deployment, Connect directly is pre-selected in the Connection Method field.
(Hybrid Deployment only) If your destination is configured for Hybrid Deployment, the Hybrid Deployment Agent associated with your destination is pre-selected in the Select an existing agent drop-down menu. To use a different agent, select the agent of your choice, and then select the same agent for your destination.
Click Save & Test. Fivetran will take it from here and sync your data from your Azure Blob storage account.
Fivetran tests and validates the Azure Blob Storage connection. On successful completion of the setup tests, you can sync your Azure Blob Storage data to your destination.
Setup tests
Fivetran performs the following Azure Blob Storage connection tests:
The Connectivity test validates your Azure Blob Storage credentials and checks if Fivetran is able to connect to Azure Blob Storage container via Private Link. We perform this test only if you have configured your connector using Connect via Private Link.
The Connecting to Container test validates the container name you specified in the setup form and checks the accessibility of your storage container.
The Validating File Pattern Regex test validates the file pattern regex you specified in the setup form. We perform this test only if you specify a regex in the File Pattern field.
The Validating Archive Pattern test validates the archive pattern regex you specified in the setup form. We perform this test only if you specify a regex in the Archive Folder Pattern field.
The Validating EscapeChar test validates the escape character you specified for your CSV files and checks the length of the character which must not be more than one. We perform this test only if you set the Non-standard character escaping? toggle to ON and specify an escape character in the Escape Character field.
The Multi-Character Delimiter Support test validates the length of the delimiter which must be within 15 characters. We perform this test only if you specify the delimiter for your CSV files in the Delimiter field.
The PGP Support test validates whether the connector can successfully retrieve a minimum of one sample file and a maximum of ten sample files from Azure Blob Storage and decrypt them using the PGP keys you uploaded. We perform this test only if you set the PGP Encryption Options toggle to ON.
NOTE: The tests may take a couple of minutes to complete.
Related articles
description Connector Overview
settings API Connector Configuration