Azure Blob Storage Setup Guide
Follow our setup guide to connect Azure Blob Storage to Fivetran.
To connect Azure Blob Storage to Fivetran, you need:
- An Azure Blob Storage container holding files with supported file types and encodings
- The ability to grant Fivetran the ability to read and list files from this container
Create a Shared Access Signature in Azure
IMPORTANT: You can re-use the Shared Access Signature (SAS) across multiple Fivetran connectors.
Open the Azure Portal.
Select your storage account and click Shared access signature.
Select Blob from the Allowed services options.
Select Container from the Allowed resource types options.
Select Read from the Allowed permissions options.
Choose the appropriate start and expiry dates of your SAS.
When the SAS expires, you will have to update your Azure Blob Storage connector to resume syncing files.
(Optional) Whitelist the Fivetran's IP address, under Allowed IP addresses to enhance security.
Select HTTPS only from the Allowed protocols options. We recommend to select the HTTPS option to ensure the security of your files.
Click Generate SAS and connection string.
Make a note of the Connection string value. You need to enter this value in the Connection String field in the connector setup form.
Finish Fivetran Configuration
In the connector setup form, enter your Destination schema name.
Enter your Destination table and your Container Name.
Enter the Connection String you found in Azure.
Choose your configuration options. Using these configuration options, you can select subsets of your folders, certain types of files, and more to sync only the files you need in your destination. Setting up multiple connectors targeted at the same container, but with different options, can allow you to slice and dice a container any way you'd like.
Folder Path This folder path is used to specify a portion of the container in which you'd like Fivetran to look for files. Any files under the specified folder and all of its nested subfolders will be examined for files we can upload. If no prefix is supplied, we'll look through the entire container for files to sync.
File Pattern The file pattern is a regular expression that we use to decide whether or not to sync certain files. It applies to everything under the prefix. For instance, suppose under the prefix
logsyou had three folders:
errors. Using the pattern
\d\d\d\d/.*, you could exclude all the files in the
\d\d\d\dapplies to the folders, and
.*applies to the files under them. If you're not sure what regular expression to use, you can leave this field blank, and we'll sync everything under the prefix. If you're feeling particularly bold, you can learn to write your own regex here.
Archive Folder Pattern If there are multiple files within archive (TAR or ZIP) folders, you can use the archive folder pattern to filter those as well. For example, the archive folder pattern
.*jsonwill sync from an archive folder only those files that end in a .json file extension.
File Type The file type is used to let Fivetran know that even files without a file extension ought to be parsed as this file type. For example, if you have an automated CSV output system that saves files without a .csv extension, you can specify the CSV type and we will sync them correctly as CSVs. Selecting "infer" will let Fivetran infer from a file's extension (
.log) what to sync. If you do choose a file type, every file we examine will be interpreted as the file type you select, so make sure everything Fivetran syncs has the same file type!
Compression The compression format is used to let Fivetran know that even files without a compression extension should be decompressed using the selected compression format. For example, if you have an automated CSV output system that GZIPs files to save space, but saves them without a .gzip extension, you can set this field to GZIP. The integration will then decompress every file that it examines using GZIP. If all of your compressed files are correctly marked with a matching compression extension (
.zip), you can select "infer".
Error Handling Selecting skip ignores any improperly formatted data within a file, allowing you to sync only valid data. Choosing fail enables you to prevent a file from syncing if any improperly formatted data is detected. With either option you will receive a notification on your dashboard if errors are encountered.
Modified File Merge When a previously synced file is modified, should the rows in the destination be replaced or should the new rows be appended to the table.
upsert_fileoption will replace records in destination, using the filename and line number as the primary key.
append_fileoption will append records.
Escape Character (Optional) CSVs have a special rule for escaping quotation marks as opposed to other characters - they require two consecutive double quotes to represent an escaped double quote. However, some CSV generators do not follow this rule and use other characters like backslash for escaping. Only use this field if you are sure your CSVs have a different escape character.
Null Sequence (Optional) CSVs have no native notion of a null character. However, some CSV generators have created one, using characters such as
\Nto represent null. Note: text is un-escaped before the null sequence is matched, so don't use the escape character in your null sequence. Only use this field if you are sure your CSVs have a null sequence.
Delimiter (Optional) The delimiter is a character used in CSV files to separate one field from the next. If this is left blank, Fivetran will infer the delimiter for each file, and files of many different types of delimiters can be stored in the same folder with no problems. If this is not left blank, then all CSV files in your search path will be parsed with this delimiter.
Click Save & Test. Fivetran will take it from here and sync your data from your Azure Blob storage account.