Use Fivetran Transformations to prepare your data for reporting.
With the release of Fivetran Transformations, Fivetran is officially a full-fledged ELT tool. With Fivetran Transformations, you can extract, load and transform in the same environment, allowing you to leverage scheduling and version control without having to jury-rig or configure other tools and technologies.
There are two main purposes for transforming data. One is for legibility and organizational coherence, similar to the approach Looker takes to models. The other is to enhance the performance and cost characteristics of your data warehouse.
Likewise, there are two methods for transforming data. The first method, the view, is not stored on disk. Every time someone accesses the view, the system runs the corresponding query to create it. In an ideal world with zero latency and unlimited computational resources, all transformations would simply be views.
Materialized views, on the other hand, are stored on disk because views generated on the fly from a large table or complex query can be extremely non-performant. In fact, the materialized views approach is sometimes necessary to get any performance at all. For instance, Resources exceeded
errors on BigQuery can easily crop up if you try to surface tables with hundreds of millions of rows.
The Fivetran transformation layer is an implementation of the materialized views approach. It constructs new tables and updates them on a schedule of your choosing. The interface looks like this:
Note that your SQL script should contain the create or replace table
syntax, as well as the name of the schema and the table that the transformation will populate.
The transformations you will perform can be sorted into the following categories:
Revising
Cleaning: altering values for formatting compatibility
Format revision/conversion: replacing incompatible characters, converting units, converting date formatting
Key restructuring: altering values so that they can be used as keys across tables
Deduplication: identifying and removing duplicate records.
Data validation: evaluate the validity of a record by the completeness of the data
Computing
Derivation: performing simple cross-column calculations
Summarization: using aggregate functions to get summary values
Pivoting: turning row values into columns
Sorting or ordering and indexing to improve search performance
Separating
Splitting: splitting a single column into multiple columns — used for fields containing delimited values, or for converting a column with multiple possible categorical values into dummy variables for regression analysis
Filtering: selecting only certain rows or columns
Combining
Joining: linking data across sources
Merging/appending/union: combining records from multiple sources
Integration: reconcile names and values for the same data element across different sources
Suppose you had a table original.activity_table
with the following columns:
Multiple people can have the same first name and each person can participate in multiple activities.
What if you wanted to clean up activity values and convert the relevant figures to metric?
CREATE or REPLACE table transformed.activity_table AS
SELECT
-- 1. Cleaning: let’s remove leading and trailing white spaces and inconsistent capitalization:
LOWER(TRIM(activity)) as activity,
-- 2. Format conversion: let’s change height from inches to cm and weight from lbs to kg
height * 2.54 as height,
weight / 2.205 as weigh
FROM
original.test_table
What if you wanted to identify and extract unique participants and create unique identifiers for them? What if you wanted to exclude records that can’t be verified because they lack emails?
CREATE OR REPLACE table transformed.user_table AS
WITH users_grouped AS
(SELECT
first_name,
email
FROM
original.test_table
WHERE
-- 5. Exclude records with null values in the “email” column
-- 11. This is an example of filtering, too
email is not NULL
-- 4. Deduplicate by grouping
GROUP BY
first_name,
email)
SELECT
first_name,
email,
-- 3. Key restructuring: let’s create an identifier by using first_name and assigning an index number based on sorting by email address
CONCAT(first_name, '-' , CAST(row_number() over (PARTITION by first_name ORDER BY email) as string)) as identifier
FROM
users_grouped
Suppose we use the same table from the last example, only without any duplicate persons and with all units in metric:
What if we want to compute BMI?
CREATE OR REPLACE table transformed.bmi_table AS
SELECT
first_name,
activity,
height,
weight,
-- 6. Perform a simple calculation involving more than one column
weight / POWER(height, 2) as bmi,
score
FROM
original.activity_table
What about getting some summary statistics for the numeric values?
CREATE OR REPLACE table transformed.summary_table AS
-- 7. Get min, max, and arithmetic mean. Note that you can use GROUP BY here to segment the data, too
-- 13. Merging records, too!
SELECT
‘score’ as attribute,
MIN(score) as minimum,
AVG(score) as mean,
MAX(score) as maximum,
STDDEV(score) as standard_deviation
FROM
original.activity_table
UNION
SELECT
‘height’ as attribute,
MIN(height) as minimum,
AVG(height) as mean,
MAX(height) as maximum,
STDDEV(score) as standard_deviation
FROM
original.activity_table
UNION
SELECT
‘weight’ as attribute,
MIN(weight) as minimum,
AVG(weight) as mean,
MAX(weight) as maximum,
STDDEV(score) as standard_deviation
FROM
original.activity_table
Percentiles are slightly trickier:
-- 7. More summary stats
CREATE OR REPLACE table transformed.percentile_table AS
SELECT
‘score’ as attribute,
PERCENTILE_CONT(score, 0.25) OVER() as first_quartile,
PERCENTILE_CONT(score, 0.50) OVER() as median,
PERCENTILE_CONT(score, 0.75) OVER() as third_quartile
FROM
salesforce.opportunity
LIMIT 1
-- ...
-- From here on, you would simply do what you did in the last example in terms of UNION
How about joining the percentile tables to the rest of the summary tables after you have created the two from above? You could do this:
CREATE OR REPLACE table transformed.full_summary_table AS
SELECT
attribute,
minimum,
first_quartile,
mean,
median,
third_quartile,
maximum,
standard_deviation,
-- 6. Another example here of derivation, using arithmetic to get the interquartile range
third_quartile - first_quartile as iqr
FROM
transformed.summary_table
-- 12. Join it!
LEFT JOIN
transformed.percentile_table on percentile_table.attribute = summary_table.attribute
Note that you can trigger this to activate after the two prerequisite tables are populated.
What if you wanted to turn some of your row values into columns? Going from “long” to “wide” is a common need in data analysis.
You may want to see the different types of activities as columns. For this, you will have to enumerate each value you want to turn into a column. Let’s say the only possible values are tennis, boxing, soccer, and hockey:
-- 8. Pivoting
CREATE OR REPLACE table transformed.scores_pivoted AS
SELECT
first_name,
activity,
score
FROM
original.activity_table
PIVOT(SUM(score) for activity in (‘tennis’, ‘boxing’, ‘soccer’, ‘hockey’)) as p
If your platform doesn’t have PIVOT, you could try something like this instead:
-- 8. Pivoting the hard way
CREATE OR REPLACE table transformed.scores_pivoted AS
SELECT
first_name,
SUM(IF(activity = ‘tennis’, score, NULL)) tennis,
SUM(IF(activity = ‘boxing’, score, NULL)) boxing,
SUM(IF(activity = ‘soccer’, score, NULL)) soccer,
SUM(IF(activity = ‘hockey’, score, NULL)) hockey
FROM
original.activity_table
GROUP BY
First_name
What if you have a column that contains multiple values? Let’s say you have a bunch of personal profiles with nicknames in a table called original.names
:
The nicknames are strings, and the values are comma-separated. You want to create a new table with a one-to-many mapping with the following columns (note that this only works in Snowflake at the time of writing this):
-- 10. Split a column containing delimited values
CREATE OR REPLACE table transformed.nicknames AS
SELECT
Id,
n.value::string as name
FROM
original.names LATERAL FLATTEN(INPUT=>SPLIT(nicknames, ‘,’)) n;
With a little creativity, there is plenty you can do to more easily organize and optimize your data for reporting and dashboards. To learn more about Fivetran Transformations, sign up for a personalized demo or get started today by signing up for a free trial.
Launch any Fivetran connector instantly.