Snowplow

Snowplow Analytics is an event tracking platform that allows you to track events across all channels and action the data in real-time. Snowplow uses trackers to track your information in whichever language your project is built.

Features

Feature Name	Supported	Notes
Capture deletes
History mode
Custom data
Data blocking		Column level
Column hashing
Re-sync		Connection level. Complete historical data.
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking
Authorization via API

Supported trackers

We pull information about your events from the code that tracks them. We officially support tracking in JavaScript, but we can also integrate tracking events from other languages, such as Ruby and Python.

Supported enrichments

We support the following Snowplow enrichments:

Enrichment	Version	Notes
IP Lookup	2-0-0	Data is enriched using the GeoIP2 City database.
UA Parser	1-0-0
Referer Parser	1-0-0
Campaign Attribution	1-0-1	If there is no configuration file, we consider the default Snowplow fields for populating the marketing columns. Name the configuration file `campaign_attribution.json`.
IP Anonymization	1-0-1	IPv6 anonymization changes are backward compatible with schema version 1-0-0. If the `anonSegments` parameter is missing, we use the `anonOctets` parameter for IPv6 addresses. Name the configuration file `anon_ip.json`.

Your JSON configuration file must follow the Snowplow schema format. You must use the enrichment-specific name (see the Notes section of the table above) for your configuration file.

To disable an enrichment, in the JSON configuration file, set the enabled field to false.

Supported deployment models

We support the SaaS Deployment model for the connector.

Setup guide

Follow our step-by-step Snowplow setup guide to connect Snowplow with your destination using Fivetran connectors.

Sync overview

After instrumenting the Snowplow tracking code on your website, server, or mobile application, the Fivetran event pipeline collects, enriches, and loads all of this data into your destination in near real-time:

Snowplow events are sent to webhooks.fivetran.com.
We collect the events.
We queue the events.
We process and enrich the events.
We prepare (transform and clean) the events and write them to a CSV file.
We load the events into your destination.

Snowplow does not allow Fivetran to sync your historical data. We can only capture data from your connection date forward. We re-import all tables during every sync.

Connection mapping

Each connection that is created within Fivetran (illustrated below in the connection icon circle) is given a specific URL to post data to. You can use Snowplow tracking services and send all of the events to the specific Fivetran URL. Any data posted to that URL is automatically normalized into a set of destination tables within the same schema.

home fivetran event tracking overview

Naming

You can name the destination schema in the Fivetran dashboard while creating the connector. Within your destination schema, we create a set of default tables for you in your destination.

Schema information

Snowplow has standard and custom fields. Each integration creates a single schema, shown below. If you choose to use custom events, each custom event type has its own table.

To zoom, open the ERD in a new window.

EVENT

Snowplow has a primary standard table, EVENT.

Fivetran de-duplicates events based on the event_id and collector_tstamp columns. If two events have the same event_id and collector_tstamp, we overwrite the first event.

The EVENT table contains the following columns:

Column name	Data Type
event_id 🔑	TEXT
collector_tstamp 🔑	TIMESTAMP
app_id	TEXT
br_colordepth	INTEGER
br_cookies	BOOLEAN
br_family	TEXT
br_features_director	BOOLEAN
br_features_flash	BOOLEAN
br_features_gears	BOOLEAN
br_features_java	BOOLEAN
br_features_pdf	BOOLEAN
br_features_quicktime	BOOLEAN
br_features_realplayer	BOOLEAN
br_features_silverlight	BOOLEAN
br_features_windowsmedia	BOOLEAN
br_lang	TEXT
br_name	TEXT
br_renderengine	TEXT
br_type	TEXT
br_version	TEXT
br_viewheight	TEXT
doc_charset	TEXT
doc_height	TEXT
doc_width	INTEGER
domain_sessionid	TEXT
domain_sessionidx	INTEGER
domain_userid	TEXT
dvce_ismobile	BOOLEAN
dvce_screenheight	TEXT
dvce_sent_tstamp	TIMESTAMP
dvce_created_tstamp	TIMESTAMP
dvce_type	TEXT
event	TEXT
event_format	TEXT
event_name	TEXT
event_vendor	TEXT
event_version	TEXT
mkt_campaign	TEXT
mkt_content	TEXT
mkt_medium	TEXT
mkt_source	TEXT
mkt_term	TEXT
mkt_clickid	TEXT
mkt_network	TEXT
name_tracker	TEXT
network_userid	TEXT
os_family	TEXT
os_manufacturer	TEXT
os_name	TEXT
os_timezone	TEXT
page_referrer	TEXT
page_title	TEXT
page_url	TEXT
page_urlfragment	TEXT
page_urlhost	TEXT
page_urlpath	TEXT
page_urlport	INTEGER
page_urlquery	TEXT
page_urlscheme	TEXT
platform	TEXT
pp_xoffset_max	INTEGER
pp_xoffset_min	INTEGER
pp_yoffset_max	INTEGER
pp_yoffset_min	INTEGER
refr_medium	TEXT
refr_source	TEXT
refr_term	TEXT
refr_urlfragment	TEXT
refr_urlhost	TEXT
refr_urlpath	TEXT
refr_urlport	INTEGER
refr_urlquery	TEXT
refr_urlscheme	TEXT
refr_domain_user_id	TEXT
refr_dvce_tstamp	TIMESTAMP
se_action	TEXT
se_category	TEXT
se_label	TEXT
se_property	TEXT
se_value	DOUBLE
sequence_number	TEXT
ti_category	TEXT
ti_currency	TEXT
ti_name	TEXT
ti_orderid	TEXT
ti_price	DOUBLE
ti_quantity	INTEGER
ti_sku	TEXT
tr_affiliation	TEXT
tr_city	TEXT
tr_country	TEXT
tr_currency	TEXT
tr_orderid	TEXT
tr_shipping	DOUBLE
tr_state	TEXT
tr_tax	DOUBLE
tr_total	DOUBLE
txn_id	TEXT
user_fingerprint	BIGINT

UA_PARSER_CONTEXT

Fivetran follows the Snowplow UA parser enrichment to parse the user agents into a separate context. The user agent parser contexts (UA_PARSER_CONTEXT table) contains the following columns:

Column Name	Data Type
event_id 🔑	TEXT
collector_tstamp 🔑	TIMESTAMP
useragent_family	TEXT
useragent_major	TEXT
useragent_minor	TEXT
useragent_patch	TEXT
useragent_version	TEXT
os_family	TEXT
os_major	TEXT
os_minor	TEXT
os_patch	TEXT
os_patch_minor	TEXT
os_version	TEXT
device_family	TEXT

BAD_EVENT

If Fivetran detects Snowplow events that don't conform to the Snowplow Canonical Event format, we add the BAD_EVENT table to your destination schema to store these events.

The BAD_EVENT table contains the following columns:

Column name	Data Type
id 🔑	TEXT
event	TEXT
reason	TEXT

Load custom fields

If you implement the Snowplow custom click feature, all the custom clicks are loaded into the EVENT table in your destination Snowplow schema.

Fivetran supports unstructured events. You can track your custom objects.

The custom event contexts (CUSTOM_CONTEXT table) contains the following columns:

Column Name	Data Type
event_id 🔑	TEXT
collector_tstamp 🔑	TIMESTAMP
_custom fields	_custom fields datatype

To load custom fields into your destination, do the following:

Write a JSON schema describing your custom attributes known as an iglu file in the Snowplow documentation. Iglu files are JSON schemas, and the Snowplow integration is configured to work with JSON schemas. For more information, you can see an explanation of how JSON schemas work.
Validate your schema before you start sending your data to us.
We don't have a repository analogous to Snowplow "iglu", so put your schema at a regular URL such as a public GitHub repository, such as: https://github.com/fivetran/snowplow-schemas/blob/master/hello_world.json or https://github.com/fivetran/snowplow-schemas/blob/master/test_table.json.

Use the URL that you established in Step 3 as the schema URL when you call Snowplow:

window[snowplowName]('trackPageView', null, [{
schema: 'https://raw.githubusercontent.com/fivetran/snowplow-schemas/master/hello_world.json',
data: {
hello: 'Hello world!',
hello_array: ['Hello', 'world!']
}
}]);

Make sure to have a "title" attribute for the schema, as the title attribute becomes the name of the table where the data gets stored. For example, the following creates a table named example (id int):
```
{
"title": "example",
"properties": {
"id": { 
    "type": "integer"
    }
}
}
```
If you want to create a decimal type (for example, for storing currency values), just use the "multipleOf" property:
```
{
"title": "example",
"properties": {
    "value": { 
    "type": "number",
    "multipleOf": 0.01
    }
}
}
```
This creates a destination table example (value decimal (18, 2)).
If you want to store an enum value, add a data type associated with it. All the enum values should be of the same data type:
```
{
"properties": {
    "value": { 
    "type": "string",
    "enum": ["value1", "value2"]
    }
}
}
```

We only support values and homogenous arrays. Nested objects and tuple-style arrays are stored in but not extracted into your destination since they don't have an obvious representation.

Exclude source data

You cannot exclude source data directly in Fivetran. If you want to prevent specific data from being sent to Fivetran, you need to exclude it independently.

Data retention

We retain data from your connector so that it can be re-synced if needed. Data retention period: Persistent.

Sync limitations

We only support publicly-hosted schemas in Iglu or GitHub.