Snowplow
Snowplow Analytics is an event tracking platform that allows you to track events across all channels and action the data in real-time. Snowplow uses trackers to track your information in whichever language your project is built.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | ||
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API |
Supported trackers
We pull information about your events from the code that tracks them. We officially support tracking in JavaScript, but we can also integrate tracking events from other languages, such as Ruby and Python.
Supported enrichments
We support the following Snowplow enrichments:
Enrichment | Version | Custom Configuration | Notes |
---|---|---|---|
IP Lookup | 2-0-0 | Data is enriched using the GeoIP2 City database. | |
UA Parser | 1-0-0 | ||
Referer Parser | 1-0-0 | ||
Campaign Attribution | 1-0-1 | check | If there is no configuration file, we consider the default Snowplow fields for populating the marketing columns. Name the configuration file campaign_attribution.json . |
IP Anonymization | 1-0-1 | check | IPv6 anonymization changes are backward compatible with schema version 1-0-0. If the anonSegments parameter is missing, we use the anonOctets parameter for IPv6 addresses. Name the configuration file anon_ip.json . |
NOTE: Your JSON configuration file must follow the Snowplow schema format. You must use the enrichment-specific name (see the Notes section of the table above) for your configuration file.
TIP: To disable an enrichment, in the JSON configuration file, set the
enabled
field to false.
Setup guide
Follow our step-by-step Snowplow setup guide to connect Snowplow with your destination using Fivetran connectors.
Sync overview
After instrumenting the Snowplow tracking code on your website, server, or mobile application, the Fivetran event pipeline collects, enriches, and loads all of this data into your destination in near real-time:
Snowplow events are sent to
webhooks.fivetran.com
.We collect the events.
We queue the events.
We process and enrich the events.
We prepare (transform and clean) the events and write them to a CSV file.
We load the events into your destination.
Snowplow does not allow Fivetran to sync your historical data. We can only capture data from your connection date forward. We re-import all tables during every sync.
Connection mapping
Each connection that is created within Fivetran (illustrated below in the connection icon circle) is given a specific URL to post data to. You can use Snowplow tracking services and send all of the events to the specific Fivetran URL. Any data posted to that URL is automatically normalized into a set of destination tables within the same schema.
Naming
You can name the destination schema in the Fivetran dashboard while creating the connector. Within your destination schema, we create a set of default tables for you in your destination.
Schema information
Snowplow has standard and custom fields. Each integration creates a single schema, shown below. If you choose to use custom events, each custom event type has its own table.
To zoom, open the ERD in a new window.EVENT
Snowplow has a primary standard table, EVENT
.
Fivetran de-duplicates events based on the event_id
and collector_tstamp
columns. If two events have the same event_id
and collector_tstamp
, we overwrite the first event.
The EVENT
table contains the following columns:
Column name | Data Type |
---|---|
event_id 🔑 | TEXT |
collector_tstamp 🔑 | TIMESTAMP |
app_id | TEXT |
br_colordepth | INTEGER |
br_cookies | BOOLEAN |
br_family | TEXT |
br_features_director | BOOLEAN |
br_features_flash | BOOLEAN |
br_features_gears | BOOLEAN |
br_features_java | BOOLEAN |
br_features_pdf | BOOLEAN |
br_features_quicktime | BOOLEAN |
br_features_realplayer | BOOLEAN |
br_features_silverlight | BOOLEAN |
br_features_windowsmedia | BOOLEAN |
br_lang | TEXT |
br_name | TEXT |
br_renderengine | TEXT |
br_type | TEXT |
br_version | TEXT |
br_viewheight | TEXT |
doc_charset | TEXT |
doc_height | TEXT |
doc_width | INTEGER |
domain_sessionid | TEXT |
domain_sessionidx | INTEGER |
domain_userid | TEXT |
dvce_ismobile | BOOLEAN |
dvce_screenheight | TEXT |
dvce_sent_tstamp | TIMESTAMP |
dvce_created_tstamp | TIMESTAMP |
dvce_type | TEXT |
event | TEXT |
event_format | TEXT |
event_name | TEXT |
event_vendor | TEXT |
event_version | TEXT |
mkt_campaign | TEXT |
mkt_content | TEXT |
mkt_medium | TEXT |
mkt_source | TEXT |
mkt_term | TEXT |
mkt_clickid | TEXT |
mkt_network | TEXT |
name_tracker | TEXT |
network_userid | TEXT |
os_family | TEXT |
os_manufacturer | TEXT |
os_name | TEXT |
os_timezone | TEXT |
page_referrer | TEXT |
page_title | TEXT |
page_url | TEXT |
page_urlfragment | TEXT |
page_urlhost | TEXT |
page_urlpath | TEXT |
page_urlport | INTEGER |
page_urlquery | TEXT |
page_urlscheme | TEXT |
platform | TEXT |
pp_xoffset_max | INTEGER |
pp_xoffset_min | INTEGER |
pp_yoffset_max | INTEGER |
pp_yoffset_min | INTEGER |
refr_medium | TEXT |
refr_source | TEXT |
refr_term | TEXT |
refr_urlfragment | TEXT |
refr_urlhost | TEXT |
refr_urlpath | TEXT |
refr_urlport | INTEGER |
refr_urlquery | TEXT |
refr_urlscheme | TEXT |
refr_domain_user_id | TEXT |
refr_dvce_tstamp | TIMESTAMP |
se_action | TEXT |
se_category | TEXT |
se_label | TEXT |
se_property | TEXT |
se_value | DOUBLE |
sequence_number | TEXT |
ti_category | TEXT |
ti_currency | TEXT |
ti_name | TEXT |
ti_orderid | TEXT |
ti_price | DOUBLE |
ti_quantity | INTEGER |
ti_sku | TEXT |
tr_affiliation | TEXT |
tr_city | TEXT |
tr_country | TEXT |
tr_currency | TEXT |
tr_orderid | TEXT |
tr_shipping | DOUBLE |
tr_state | TEXT |
tr_tax | DOUBLE |
tr_total | DOUBLE |
txn_id | TEXT |
user_fingerprint | BIGINT |
UA_PARSER_CONTEXT
Fivetran follows the Snowplow UA parser enrichment to parse the user agents into a separate context. The user agent parser contexts (UA_PARSER_CONTEXT
table) contains the following columns:
Column Name | Data Type |
---|---|
event_id 🔑 | TEXT |
collector_tstamp 🔑 | TIMESTAMP |
useragent_family | TEXT |
useragent_major | TEXT |
useragent_minor | TEXT |
useragent_patch | TEXT |
useragent_version | TEXT |
os_family | TEXT |
os_major | TEXT |
os_minor | TEXT |
os_patch | TEXT |
os_patch_minor | TEXT |
os_version | TEXT |
device_family | TEXT |
BAD_EVENT
If Fivetran detects Snowplow events that don't conform to the Snowplow Canonical Event format, we add the BAD_EVENT
table to your destination schema to store these events.
The BAD_EVENT
table contains the following columns:
Column name | Data Type |
---|---|
id 🔑 | TEXT |
event | TEXT |
reason | TEXT |
Load custom fields
If you implement the Snowplow custom click feature, all the custom clicks are loaded into the EVENT
table in your destination Snowplow schema.
Fivetran supports unstructured events. You can track your custom objects.
The custom event contexts (CUSTOM_CONTEXT
table) contains the following columns:
Column Name | Data Type |
---|---|
event_id 🔑 | TEXT |
collector_tstamp 🔑 | TIMESTAMP |
_custom fields | _custom fields datatype |
To load custom fields into your destination, do the following:
Write a JSON schema describing your custom attributes known as an iglu file in the Snowplow documentation. Iglu files are JSON schemas, and the Snowplow integration is configured to work with JSON schemas. For more information, you can see an explanation of how JSON schemas work.
Validate your schema before you start sending your data to us.
We don't have a repository analogous to Snowplow "iglu", so put your schema at a regular URL such as a public GitHub repository, such as: https://github.com/fivetran/snowplow-schemas/blob/master/hello_world.json or https://github.com/fivetran/snowplow-schemas/blob/master/test_table.json.
Use the URL that you established in Step 3 as the schema URL when you call Snowplow:
window[snowplowName]('trackPageView', null, [{ schema: 'https://raw.githubusercontent.com/fivetran/snowplow-schemas/master/hello_world.json', data: { hello: 'Hello world!', hello_array: ['Hello', 'world!'] } }]);
Make sure to have a "title" attribute for the schema, as the title attribute becomes the name of the table where the data gets stored. For example, the following creates a table named
example
(id int):{ "title": "example", "properties": { "id": { "type": "integer" } } }
If you want to create a decimal type (for example, for storing currency values), just use the "multipleOf" property:
{ "title": "example", "properties": { "value": { "type": "number", "multipleOf": 0.01 } } }
This creates a destination table example (value decimal (18, 2)).
If you want to store an enum value, add a data type associated with it. All the enum values should be of the same data type:
{ "properties": { "value": { "type": "string", "enum": ["value1", "value2"] } } }
NOTE: We only support values and homogenous arrays. Nested objects and tuple-style arrays are stored in but not extracted into your destination since they don't have an obvious representation.
Exclude source data
You cannot exclude source data directly in Fivetran. If you want to prevent specific data from being sent to Fivetran, you need to exclude it independently.
Data retention
We retain data from your connector so that it can be re-synced if needed. Data retention period: Persistent.
Sync limitations
We only support publicly-hosted schemas in Iglu or GitHub.