Segment
Segment is an event tracking library.
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | DELETE table . |
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API | check |
NOTE: The
DELETE
table contains themessage_id
,project_id
,received_at
,sent_at
,timestamp
,type
, anduser_id
columns as received from the webhook event.
Supported services
You can send events directly to your destination using:
Setup guide
Follow our step-by-step Segment setup guide to connect Segment with your destination using Fivetran connectors.
Sync overview
The Segment connector can sync data using webhooks or S3 buckets:
For webhooks, once events are sent directly to the webhook endpoint, we do the following:
- We process these events using our Webhook Pull Service (WPS) collector.
- We store the events in a Google Cloud Storage (GCS) bucket.
- We read these events from the GCS bucket using the WPSClient service.
For S3 buckets, once Segment exports the events into your AWS S3 bucket, we do the following:
- We read these events using the AWS S3 client.
- We process the data.
- We write the data to standard tables, such as
USERS
,DELETE
, andTRACK
, and dynamically create additional tables for custom events.
Connection mapping
Each connection that is created within Fivetran (illustrated below in the connection icon circle) is given a specific URL to post data to. You can use Segment tracking services and send all of the events to the specific Fivetran URL. Any data posted to that URL is automatically normalized into a set of destination tables within the same schema.
Naming
You can name the destination schema in the Fivetran dashboard while creating the connector. Within your destination schema, we create a set of default tables in your destination.
Schema information
Segment schema follows the Fivetran standard API schema rules. We follow the Segment schema as closely as possible, and support all tables in their schema, except for the ones in testing stage.
We further normalize the tables to ensure there is as little duplication as possible across all tables.
We centralize all the data in a star schema. The central table of the star schema is TRACKS
which maintains the standard property for every track event. For example, all event tables have a device column, which we pull into the central table, among other common columns across all tables.
We sync the following standard tables:
groups
The table contains the following columns:
Column Name | Data Type |
---|---|
message_id 🔑 | STRING |
_fivetran_synced | TIMESTAMP |
address_city | STRING |
address_country | STRING |
address_postal_code | STRING |
address_state | STRING |
address_street | STRING |
anonymous_id | STRING |
avatar | STRING |
context_app_build | STRING |
context_app_name | STRING |
context_app_version | STRING |
context_campaign_content | STRING |
context_campaign_medium | STRING |
context_campaign_name | STRING |
context_campaign_source | STRING |
context_campaign_term | STRING |
context_device_id | STRING |
context_device_manufacturer | STRING |
context_device_model | STRING |
context_device_name | STRING |
context_device_type | STRING |
context_device_version | STRING |
context_ip | STRING |
context_library_name | STRING |
context_library_version | STRING |
context_locale | STRING |
context_location_city | STRING |
context_location_country | STRING |
context_location_latitude | FLOAT |
context_location_longitude | FLOAT |
context_location_region | STRING |
context_location_speed | INTEGER |
context_network_bluetooth | BOOLEAN |
context_network_carrier | STRING |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | STRING |
context_os_version | STRING |
context_page_hash | STRING |
context_page_path | STRING |
context_page_referrer | STRING |
context_page_search | STRING |
context_page_title | STRING |
context_page_url | STRING |
context_referrer_id | STRING |
context_referrer_link | STRING |
context_referrer_name | STRING |
context_referrer_type | STRING |
context_referrer_url | STRING |
context_screen_density | INTEGER |
context_screen_height | INTEGER |
context_screen_width | INTEGER |
context_timezone | STRING |
context_user_agent | STRING |
created_at | TIMESTAMP |
description | STRING |
STRING | |
employees | STRING |
group_id | STRING |
id | STRING |
industry | STRING |
name | STRING |
phone | STRING |
project_id | STRING |
received_at | TIMESTAMP |
sent_at | TIMESTAMP |
timestamp | TIMESTAMP |
type | STRING |
user_id | STRING |
version | STRING |
website | STRING |
identifies
The table contains the following columns:
Column Name | Data Type |
---|---|
message_id 🔑 | TEXT |
address_city | TEXT |
address_country | TEXT |
address_postal_code | TEXT |
address_state | TEXT |
address_street | TEXT |
age | BIGINT |
anonymous_id | TEXT |
avatar | TEXT |
birthday | TIMESTAMP |
context_app_build | TEXT |
context_app_name | TEXT |
context_app_version | TEXT |
context_campaign_content | TEXT |
context_campaign_medium | TEXT |
context_campaign_name | TEXT |
context_campaign_source | TEXT |
context_campaign_term | TEXT |
context_device_id | TEXT |
context_device_manufacturer | TEXT |
context_device_model | TEXT |
context_device_name | TEXT |
context_device_type | TEXT |
context_device_version | TEXT |
context_ip | TEXT |
context_library_name | TEXT |
context_library_version | TEXT |
context_locale | TEXT |
context_location_city | TEXT |
context_location_country | TEXT |
context_location_latitude | DOUBLE |
context_location_longitude | DOUBLE |
context_location_region | TEXT |
context_location_speed | BIGINT |
context_network_bluetooth | BOOLEAN |
context_network_carrier | TEXT |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | TEXT |
context_os_version | TEXT |
context_page_hash | TEXT |
context_page_path | TEXT |
context_page_referrer | TEXT |
context_page_search | TEXT |
context_page_title | TEXT |
context_page_url | TEXT |
context_referrer_link | TEXT |
context_referrer_name | TEXT |
context_referrer_type | TEXT |
context_referrer_url | TEXT |
context_screen_density | BIGINT |
context_screen_height | BIGINT |
context_screen_width | BIGINT |
context_timezone | TEXT |
context_user_agent | TEXT |
created_at | TIMESTAMP |
current_country | TEXT |
description | TEXT |
TEXT | |
first_name | TEXT |
gender | TEXT |
given_name | TEXT |
integrations_google_analytics | BOOLEAN |
integrations_mixpanel | BOOLEAN |
last_name | TEXT |
name | TEXT |
phone | TEXT |
project_id | TEXT |
received_at | TIMESTAMP |
sent_at | TIMESTAMP |
surname | TEXT |
TIMESTAMP | TIMESTAMP |
title | TEXT |
type | TEXT |
user_id | TEXT |
username | TEXT |
version | TEXT |
website | TEXT |
pages
The table contains the following columns:
Column Name | Data Type |
---|---|
message_id 🔑 | TEXT |
anonymous_id | TEXT |
app_session_id | TEXT |
client_type | TEXT |
context_app_build | TEXT |
context_app_name | TEXT |
context_app_version | TEXT |
context_campaign_content | TEXT |
context_campaign_medium | TEXT |
context_campaign_name | TEXT |
context_campaign_source | TEXT |
context_campaign_term | TEXT |
context_device_id | TEXT |
context_device_manufacturer | TEXT |
context_device_model | TEXT |
context_device_name | TEXT |
context_device_type | TEXT |
context_device_version | TEXT |
context_ip | TEXT |
context_library_name | TEXT |
context_library_version | TEXT |
context_locale | TEXT |
context_location_city | TEXT |
context_location_country | TEXT |
context_location_latitude | DOUBLE |
context_location_longitude | DOUBLE |
context_location_region | TEXT |
context_location_speed | BIGINT |
context_network_bluetooth | BOOLEAN |
context_network_carrier | TEXT |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | TEXT |
context_os_version | TEXT |
context_page_hash | TEXT |
context_page_path | TEXT |
context_page_referrer | TEXT |
context_page_search | TEXT |
context_page_title | TEXT |
context_page_url | TEXT |
context_referrer_link | TEXT |
context_referrer_name | TEXT |
context_referrer_type | TEXT |
context_referrer_url | TEXT |
context_screen_density | BIGINT |
context_screen_height | BIGINT |
context_screen_width | BIGINT |
context_timezone | TEXT |
context_user_agent | TEXT |
environment | TEXT |
name | TEXT |
order_id | TEXT |
order_session_id | TEXT |
path | TEXT |
project_id | TEXT |
received_at | TIMESTAMP |
referrer | TEXT |
search | TEXT |
sent_at | TIMESTAMP |
store_id | TEXT |
TIMESTAMP | TIMESTAMP |
title | TEXT |
type | TEXT |
url | TEXT |
user_id | TEXT |
version | TEXT |
screens
The table contains the following columns:
Column Name | Data Type |
---|---|
message_id 🔑 | TEXT |
anonymous_id | TEXT |
brand_id | DOUBLE |
category_id | DOUBLE |
context_app_build | TEXT |
context_app_name | TEXT |
context_app_version | TEXT |
context_campaign_content | TEXT |
context_campaign_medium | TEXT |
context_campaign_name | TEXT |
context_campaign_source | TEXT |
context_campaign_term | TEXT |
context_device_id | TEXT |
context_device_manufacturer | TEXT |
context_device_model | TEXT |
context_device_name | TEXT |
context_device_type | TEXT |
context_device_version | TEXT |
context_ip | TEXT |
context_library_name | TEXT |
context_library_version | TEXT |
context_locale | TEXT |
context_location_city | TEXT |
context_location_country | TEXT |
context_location_latitude | DOUBLE |
context_location_longitude | DOUBLE |
context_location_region | TEXT |
context_location_speed | BIGINT |
context_network_bluetooth | BOOLEAN |
context_network_carrier | TEXT |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | TEXT |
context_os_version | TEXT |
context_page_hash | TEXT |
context_page_path | TEXT |
context_page_referrer | TEXT |
context_page_search | TEXT |
context_page_title | TEXT |
context_page_url | TEXT |
context_referrer_link | TEXT |
context_referrer_name | TEXT |
context_referrer_type | TEXT |
context_referrer_url | TEXT |
context_screen_density | BIGINT |
context_screen_height | BIGINT |
context_screen_width | BIGINT |
context_timezone | TEXT |
context_user_agent | TEXT |
country_name | TEXT |
coutry_name | TEXT |
curated_home | BOOLEAN |
department_id | DOUBLE |
name | TEXT |
old_screen_name | TEXT |
postal_code | TEXT |
project_id | TEXT |
received_at | TIMESTAMP |
sent_at | TIMESTAMP |
service_type | TEXT |
store_id | DOUBLE |
TIMESTAMP | TIMESTAMP |
title | TEXT |
type | TEXT |
user_id | TEXT |
version | TEXT |
tracks
The table contains the following columns:
Column Name | Data Type |
---|---|
anonymous_id | TEXT |
context_app_build | TEXT |
context_app_name | TEXT |
context_app_version | TEXT |
context_campaign_content | TEXT |
context_campaign_medium | TEXT |
context_campaign_name | TEXT |
context_campaign_source | TEXT |
context_campaign_term | TEXT |
context_device_id | TEXT |
context_device_manufacturer | TEXT |
context_device_model | TEXT |
context_device_name | TEXT |
context_device_type | TEXT |
context_device_version | TEXT |
context_ip | TEXT |
context_library_name | TEXT |
context_library_version | TEXT |
context_locale | TEXT |
context_location_city | TEXT |
context_location_country | TEXT |
context_location_latitude | DOUBLE |
context_location_longitude | DOUBLE |
context_location_region | TEXT |
context_location_speed | BIGINT |
context_network_bluetooth | BOOLEAN |
context_network_carrier | TEXT |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | TEXT |
context_os_version | TEXT |
context_page_hash | TEXT |
context_page_path | TEXT |
context_page_referrer | TEXT |
context_page_search | TEXT |
context_page_title | TEXT |
context_page_url | TEXT |
context_referrer_link | TEXT |
context_referrer_name | TEXT |
context_referrer_type | TEXT |
context_referrer_url | TEXT |
context_screen_density | BIGINT |
context_screen_height | BIGINT |
context_screen_width | BIGINT |
users
The table contains the following columns:
Column Name | Data Type |
---|---|
message_id 🔑 | STRING |
_fivetran_synced | TIMESTAMP |
address_city | STRING |
address_country | STRING |
address_postal_code | STRING |
address_state | STRING |
address_street | STRING |
age | INTEGER |
anonymous_id | STRING |
avatar | STRING |
birthday | TIMESTAMP |
context_app_build | STRING |
context_app_name | STRING |
context_app_version | STRING |
context_campaign_content | STRING |
context_campaign_medium | STRING |
context_campaign_name | STRING |
context_campaign_source | STRING |
context_campaign_term | STRING |
context_device_id | STRING |
context_device_manufacturer | STRING |
context_device_model | STRING |
context_device_name | STRING |
context_device_type | STRING |
context_device_version | STRING |
context_ip | STRING |
context_library_name | STRING |
context_library_version | STRING |
context_locale | STRING |
context_location_city | STRING |
context_location_country | STRING |
context_location_latitude | FLOAT |
context_location_longitude | FLOAT |
context_location_region | STRING |
context_location_speed | INTEGER |
context_network_bluetooth | BOOLEAN |
context_network_carrier | STRING |
context_network_cellular | BOOLEAN |
context_network_wifi | BOOLEAN |
context_os_name | STRING |
context_os_version | STRING |
context_page_hash | STRING |
context_page_path | STRING |
context_page_referrer | STRING |
context_page_search | STRING |
context_page_title | STRING |
context_page_url | STRING |
context_referrer_id | STRING |
context_referrer_link | STRING |
context_referrer_name | STRING |
context_referrer_type | STRING |
context_referrer_url | STRING |
context_screen_density | INTEGER |
context_screen_height | INTEGER |
context_screen_width | INTEGER |
context_timezone | STRING |
context_user_agent | STRING |
created_at | TIMESTAMP |
description | STRING |
STRING | |
first_name | STRING |
gender | STRING |
go_last_open_date | DATE |
go_signup_date | DATE |
go_user_id | STRING |
last_name | STRING |
name | STRING |
phone | STRING |
project_id | STRING |
received_at | TIMESTAMP |
sent_at | TIMESTAMP |
signup_region_code | STRING |
timestamp | TIMESTAMP |
title | STRING |
traits_first_name | STRING |
traits_last_name | STRING |
traits_user_id | STRING |
type | STRING |
user_id | STRING |
username | STRING |
version | STRING |
website | STRING |
Nested columns support
The tables support nested dynamic columns. We add a prefix to the nested column names to avoid potential conflicts between the custom column names and the schema column names. Depending on the attribute type, we add a prefix, properties_
or traits_
, to the custom column name to unnest and promote them.
For example, if we detect the timestamp
nested column:
{
timestamp : "1970-01-01T00:00:00Z",
properties :
{
timestamp : 16000000000
}
}
we rename it as properties_timestamp
:
{
timestamp : "1970-01-01T00:00:00Z",
properties_timestamp : 16000000000
}
Data retention
We retain data from your connector if you chose to use webhooks instead of S3 bucket. We store this data so that it can be re-synced if needed. Data retention period: Persistent.
Limitations
- To maintain data integrity, we delay processing events from your S3 bucket by skipping the most recent 30 minutes of data.
- It takes us a few minutes when to process the events sent to the webhook endpoint through our Webhook Pull Service (WPS).