Snowplow
Snowplow Analytics is an event tracking platform that allows you to track events across all channels and action the data in real-time. Snowplow uses trackers to track your information in whichever language your project is built.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | ||
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API |
Supported trackers
We pull information about your events from the code that tracks your events. We officially support tracking in JavaScript, but can integrate tracking events from other languages such as Ruby and Python.
Enrichments
We support the following Snowplow enrichments:
Enrichment | Version | Custom Configuration | Notes |
---|---|---|---|
IP Lookup | 2-0-0 | Data is enriched using the GeoIP2 City database. | |
UA Parser | 1-0-0 | ||
Referer Parser | 1-0-0 | ||
Campaign Attribution | 1-0-1 | check | If there is no configuration file, we consider the default Snowplow fields for populating the marketing columns. Name the configuration file campaign_attribution.json . |
IP Anonymization | 1-0-1 | check | IPv6 anonymization changes are backward compatible with schema version 1-0-0. If the anonSegments parameter is missing, we use the anonOctets parameter for IPv6 addresses. Name the configuration file anon_ip.json . |
NOTE: Your JSON configuration file must follow the Snowplow schema format. You must use the enrichment-specific name (see the Notes section of the table above) for your configuration file.
To disable an enrichment, in the JSON configuration file, set the enabled
field to false.
Setup guide
Follow our step-by-step Snowplow setup guide to connect Snowplow with your destination using Fivetran connectors.
Sync overview
After instrumenting the Snowplow tracking code on your website, server, or mobile application, Fivetran's robust event pipeline will collect, enrich, and load all of this data into your destination in near real-time:
Snowplow events sent to
webhooks.fivetran.com
Events collected
Events queued
Events processed and enriched
Events prepared (transformed and cleaned) and written to CSV file
Loaded into your destination
Snowplow does not allow Fivetran to sync your historical data. We can only capture data from your connection date forward.
Connection mapping
Each connection that is created within Fivetran (illustrated below in the connection icon circle) is given a specific URL to post data to. You can use Snowplow tracking services and send all of the events to the specific Fivetran URL. Any data posted to that URL is automatically normalized into a set of destination tables within the same schema.
Naming
You can name the destination schema in the Fivetran dashboard while creating the connector. Within your destination schema, we create a set of default tables for you in your destination.
Schema information
Snowplow has standard and custom fields. Each integration creates a single schema, shown below. If you choose to use custom events, each custom event type will have its own table.
To zoom, open the ERD in a new window.EVENT
Snowplow has a primary standard table, EVENT
. Fivetran deduplicates events based on the event_id
and collector_tstamp
columns. If two events have the same event_id
and collector_tstamp
, we overwrite the first event.
The columns in the EVENT
table have the following data types:
Column | Data Type |
---|---|
event_id 🔑 | TEXT |
collector_tstamp 🔑 | TIMESTAMP |
app_id | TEXT |
br_colordepth | INTEGER |
br_cookies | BOOLEAN |
br_family | TEXT |
br_features_director | BOOLEAN |
br_features_flash | BOOLEAN |
br_features_gears | BOOLEAN |
br_features_java | BOOLEAN |
br_features_pdf | BOOLEAN |
br_features_quicktime | BOOLEAN |
br_features_realplayer | BOOLEAN |
br_features_silverlight | BOOLEAN |
br_features_windowsmedia | BOOLEAN |
br_lang | TEXT |
br_name | TEXT |
br_renderengine | TEXT |
br_type | TEXT |
br_version | TEXT |
br_viewheight | TEXT |
doc_charset | TEXT |
doc_height | TEXT |
doc_width | INTEGER |
domain_sessionid | TEXT |
domain_sessionidx | INTEGER |
domain_userid | TEXT |
dvce_ismobile | BOOLEAN |
dvce_screenheight | TEXT |
dvce_sent_tstamp | TIMESTAMP |
dvce_created_tstamp | TIMESTAMP |
dvce_type | TEXT |
event | TEXT |
event_format | TEXT |
event_name | TEXT |
event_vendor | TEXT |
event_version | TEXT |
mkt_campaign | TEXT |
mkt_content | TEXT |
mkt_medium | TEXT |
mkt_source | TEXT |
mkt_term | TEXT |
mkt_clickid | TEXT |
mkt_network | TEXT |
name_tracker | TEXT |
network_userid | TEXT |
os_family | TEXT |
os_manufacturer | TEXT |
os_name | TEXT |
os_timezone | TEXT |
page_referrer | TEXT |
page_title | TEXT |
page_url | TEXT |
page_urlfragment | TEXT |
page_urlhost | TEXT |
page_urlpath | TEXT |
page_urlport | INTEGER |
page_urlquery | TEXT |
page_urlscheme | TEXT |
platform | TEXT |
pp_xoffset_max | INTEGER |
pp_xoffset_min | INTEGER |
pp_yoffset_max | INTEGER |
pp_yoffset_min | INTEGER |
refr_medium | TEXT |
refr_source | TEXT |
refr_term | TEXT |
refr_urlfragment | TEXT |
refr_urlhost | TEXT |
refr_urlpath | TEXT |
refr_urlport | INTEGER |
refr_urlquery | TEXT |
refr_urlscheme | TEXT |
refr_domain_user_id | TEXT |
refr_dvce_tstamp | TIMESTAMP |
se_action | TEXT |
se_category | TEXT |
se_label | TEXT |
se_property | TEXT |
se_value | DOUBLE |
sequence_number | TEXT |
ti_category | TEXT |
ti_currency | TEXT |
ti_name | TEXT |
ti_orderid | TEXT |
ti_price | DOUBLE |
ti_quantity | INTEGER |
ti_sku | TEXT |
tr_affiliation | TEXT |
tr_city | TEXT |
tr_country | TEXT |
tr_currency | TEXT |
tr_orderid | TEXT |
tr_shipping | DOUBLE |
tr_state | TEXT |
tr_tax | DOUBLE |
tr_total | DOUBLE |
txn_id | TEXT |
user_fingerprint | BIGINT |
UA_PARSER_CONTEXT
Fivetran follows Snowplow's UA parser enrichment to parse User agents into a separate context.
The user agent parser contexts (UA_PARSER_CONTEXT
table) have the following columns:
Column | Type |
---|---|
event_id 🔑 | TEXT |
collector_tstamp 🔑 | TIMESTAMP |
useragent_family | TEXT |
useragent_major | TEXT |
useragent_minor | TEXT |
useragent_patch | TEXT |
useragent_version | TEXT |
os_family | TEXT |
os_major | TEXT |
os_minor | TEXT |
os_patch | TEXT |
os_patch_minor | TEXT |
os_version | TEXT |
device_family | TEXT |
BAD_EVENT
If Fivetran detects Snowplow events that don't conform to the Snowplow Canonical Event format, we add the BAD_EVENT
table to your destination schema to store these events. The columns in the BAD_EVENT
table have the following data types:
Column | Data Type |
---|---|
id 🔑 | TEXT |
event | TEXT |
reason | TEXT |
Load custom fields
If you implement Snowplow's custom click feature, all the custom clicks will be loaded into the EVENT
table in your destination's Snowplow schema.
Fivetran supports unstructured events. You can track your custom objects.
To load custom fields into your destination, do the following:
Write a JSON schema describing your custom attributes known as an iglu file in the Snowplow documentation. Iglu files are JSON schemas and Snowplow's integration is setup to work with JSON schemas. For more information you can see an explanation of how JSON schemas work.
Validate your schema before you start sending your data to us.
We don't have a repository that's analogous to Snowplow's "iglu", so just put your schema at a regular URL such as a public github repository, such as: https://github.com/fivetran/snowplow-schemas/blob/master/hello_world.json or https://github.com/fivetran/snowplow-schemas/blob/master/test_table.json
Use the URL that you established in step 3 as the schema URL when you call Snowplow:
window[snowplowName]('trackPageView', null, [{ schema: 'https://raw.githubusercontent.com/fivetran/snowplow-schemas/master/hello_world.json', data: { hello: 'Hello world!', hello_array: ['Hello', 'world!'] } }]);
Make sure to have a "title" attribute for the schema, which will become the name of the table where the data gets stored. For example, the following will create a table named
example
(id int):{ "title": "example", "properties": { "id": { "type": "integer" } } }
If you want to create a decimal type (for example, for storing currency values), just use the multipleOf property:
{ "title": "example", "properties": { "value": { "type": "number", "multipleOf": 0.01 } } }
This will create a destination table example (value decimal (18, 2)).
If you want to store an enum value, add a type associated with it. All the enum values should be of same data type:
{ "properties": { "value": { "type": "string", "enum": ["value1", "value2"] } } }
NOTE: We only support values and homogenous arrays. Nested objects and tuple-style arrays are stored in but not extracted into your destination since they don't have an obvious representation.
Exclude source data
We don't yet have a way to exclude source data. You can independently, however, limit the data you send to Fivetran.
Sync limitations
We only support publicly-hosted schemas in Iglu or GitHub.