Salesforce is a cloud-based CRM platform.
We pull all the underlying data from Salesforce (SFDC) into a SQL-enabled environment to allow you to get a more in-depth analysis of your CRM data.
|Capture Deletes||check||Only for replicateable objects|
|Data Blocking||check||Column level, table level, and schema level|
|Re-sync||check||Table level and connector level|
|dbt Package||check||Get the package|
|Service Cloud||Including Salesforce Knowledge|
|Marketing Cloud||Using our Marketing Cloud Connector|
|Engagement||Using our Heroku Postgres or Heroku Kafka connectors|
|Industries||Industry-specific like finance and healthcare|
Follow our step-by-step Salesforce setup guide to connect Salesforce with your destination using Fivetran connectors.
We use both the REST API and the Bulk API to sync your Salesforce data. We select the sync strategy depending on the data size and the number of updated records since the last sync. We use multiple techniques to minimize the total number of API calls, specifically the Bulk API calls, because Salesforce limits the number of Bulk API calls.
The number of API calls that we use varies from company to company. Fivetran has optimized our Salesforce connector to use the fewest possible API calls necessary. In practice, customers find that Fivetran is not a significant API consumer.
Each Salesforce account has a rolling 24 hour API quota. Fivetran automatically reschedules for one hour later any time 90% of the Salesforce API call quota has been consumed. This ensures there is always sufficient API quota for other business-critical applications.
The system table FIVETRAN_API_CALL tracks information about API calls that are made to sync the data into the warehouse.
Number of API calls
The FIVETRAN_QUERY table contains information about the SOQL (Salesforce Object Query Language) queries we use to retrieve the objects that we eventually load into your data warehouse. For every query in this table, we use approximately two requests. By grouping the timestamps by date and multiplying the number of queries by two, you can see approximately how many queries we used over the course of a given day. To find out Fivetran's approximate API usage by day, run this query:
SELECT done::date, count(*)*2 as approx_num_api_requests FROM salesforce._fivetran_query group by done::date;
Fivetran doesn't set the system mod stamp for the Salesforce tables; these are established by Salesforce.
We begin syncing all tables at the same time, so generally, all of the tables are finished syncing at approximately the same time. Any variation is likely to result from some tables being bigger — thus taking longer to sync — than others.
To provide more insight into our process, we've also added a column into each table called
fivetran_synced which displays the time at which that particular record was synced.
Automatic schema replication
For each account that you connect, we create a different schema in your data warehouse. The schema we create maps closely to the native Salesforce schema so that the data is in a familiar format for you to work with. If the structure of the data in the source changes (for example, you add new columns, custom fields, or change a data type) Fivetran automatically detects and pushes these changes into your data warehouse.
Because we pull nearly everything that Salesforce stores, you can recreate any report that Salesforce offers, along with many that it doesn’t, from the raw data.
For example, suppose you wanted to recreate the Salesforce report “Leads with Converted Lead Information,” which shows information about when a lead was converted. You can see that in the raw data, this information is kept in two distinct tables, opportunities and leads. To recreate the report, perform a join between the two tables.
Create reports beyond what Salesforce offers
Because Fivetran never deletes data in your warehouse, you can create reports using records that no longer exist in Salesforce but that Fivetran has synced to your warehouse.
When Salesforce marks a record as
is_deleted, it retains that record for about 30 days before permanently deleting it. That means you can no longer perform analysis using those records in Salesforce. Fortunately, Fivetran doesn’t delete any of the
is_deleted rows, so once you sync your Salesforce data using Fivetran, you will be able to see them, starting from when you integrated with Fivetran.
Field History Tracking
If you decide to use Field History Tracking to track and capture all the changes that occur for selected fields, you need to enable this within Salesforce. Because Fivetran updates incrementally and only captures the value at the time of the update, enabling Field History Tracking in Salesforce will create additional tables inside of Salesforce that track all changes for the field. Fivetran will then import these tables as separate tables during incremental updates. To learn how to do enable field history tracking, see our setup guide.
Syncing empty tables and columns
Fivetran can sync empty tables and columns for your Salesforce connector. For more information, see our Features documentation.
Fivetran follows a one-to-one mapping with all tables in Salesforce. If you can't find something that you're looking for, your Salesforce user may have limited access to the information. Fivetran tries to grab all possible objects, except formula fields.
This schema diagram illustrates how tables relate to one another. It is not a complete Entity Relationship Diagram. It shows relationships for some central entities and workflows. Many standard and all custom entities that are delivered by the Fivetran connector are omitted for brevity.
We also offer you information about the connector from the
fivetran_audit table and the Fivetran
There are some common reasons that you may not be able to find a Salesforce table that you're looking for:
- Fivetran cannot access the table because the connecting user does not have sufficient Salesforce permissions.
- The table is a "formula" or "compound" field.
- The table name is different from what you expected. Fivetran names the tables it delivers to your data warehouse based on the Salesforce "name" field for a given object. This is a field that an organization gives to the object, and it may differ from the display name or internal "label."
Fivetran follows a set of standard naming conventions for schemas, tables, and columns. To learn more about our naming conventions, read the naming conventions section of our Core Concepts document.
If you change a name that gets displayed in Salesforce reports it won't affect your connector.
However, if you change the name of the field itself, our system will:
- keep the old field
- add a new column, and
- backfill this new column with values from the old column, syncing updates to this new column from now on.
If you add a new column in the warehouse, any old values in that column will not get backfilled. In other words, if you hide permissions to a column and then grant permissions, Fivetran will only sync new records, not the old ones.
Infrequently updated tables
When Fivetran finds that a table is not updated in the source for seven days in a row, we switch our strategy to checking that infrequently updated table for new data once a day. When the table is updated in the source again, we return to our usual update strategy.
Because we only check infrequently updated tables once a day, there can be a delay of up to one day when an infrequently updated table is updated in the source for the first time.
We tend not to sync any pre-aggregated tables or views because you already have access to the underlying source tables. We also do not sync archived tables because of Salesforce's limitations.
Fivetran automatically detects tables with the
replicateable = false value as re-import tables. We resync these tables because we can not capture deletes from them. These tables are included in the incremental sync but are re-imported in full at different frequencies.
The re-import frequency of the tables depends on how fast we can ingest data. Salesforce delivers data at varying frequencies and our speed of ingestion is completely dependent on them.
|Table Import Duration||Import Frequency|
|3 seconds or less||Every sync|
|Between 3 seconds and 5 minutes||Twice a day|
|More than 5 minutes||Once a week|
We mark the re-imported tables as
notRecommended in the dashboard and include a message that we will re-import them in full if they are selected. We exclude the tables from incremental syncs by default because they degrade performance. However, you can choose to include them.
We try to be as transparent as possible with our errors so that you know what's going on.
In the configuration section of your Fivetran dashboard, you may see error messages such as, "not queryable". You might see these errors because Fivetran only pulls the underlying tables (for example, not aggregations on top of base tables) of the data that makes sense to keep in a data warehouse (for example, no images, PDFs or attachments), not just all available tables.
If you're testing out Fivetran in a Sandbox instance, that may also result in errors.
Your activity tables will return the error 'not queryable' because activity history (
activity_history) is not directly queryable with the Salesforce API. Activity history is effectively a view on the
Event tables. We suggest you create your own view of these two tables to get the same information.
Formula fields can be derived from both custom and standard fields. We use a system table called fivetran_formula store data related to formula fields. This table contains the object name, formula field name, and formula that is used to compute the resulting data.
Salesforce users can change formulas. To keep track of these changes, we sync another system table named fivetran_formula_history. Using this table, we can capture a history of formula field changes as they occur, starting from the date you first connected to Fivetran.
We disable formula fields by default. However, we can enable them at your request. Be warned — formula fields can cause subtle problems in your data.
Why formula and lookup fields are problematic for data warehouses
Salesforce has two concepts, formula fields and lookup fields.
Leads table looks like this:
|L2||Terrible Lead :(||2017-01-02||NULL|
Suppose we then add a formula field
IsGreat = CONTAINS(Name, 'Great'), and a lookup field
AccountName = SELECT Name FROM Account WHERE Id = Lead.AccountId:
|L2||Terrible lead : (||2017-01-02||NULL||FALSE||NULL|
Suppose we sync this table, including formula and lookup fields, into our data warehouse. This is a bad idea for two reasons:
1. Formula fields don't update when the formula changes
Suppose someone changes the formula field in Salesforce to
IsGreat = CONTAINS(Name, 'Amazing') and adds a new lead:
|L2||Terrible lead : (||2017-01-02||NULL||FALSE||NULL|
Changing the formula doesn't update LastModified, so our data warehouse now contains a mixture of the old definition of IsGreat and the new definition. A better solution is to write your SQL queries to work off the underlying values and keep the formula fields out of your data warehouse, for example:
SELECT name LIKE '%Great%' AS is_great FROM lead.
Salesforce explains this behavior in the official documentation:
...a change to a formula field will not trigger a sync. This is because a formula field is a way of arriving at a value, but does not actually store the value and will not update the last modified timestamp on a record when it changes.
2. Lookup fields don't update when the target object changes
Lookup fields are even more problematic. Suppose we change the account name from
Foo Corp Private Limited. This will change
Account.LastModified but not
Lead.LastModified, so the value in your data warehouse will be wrong:
|L2||Terrible lead : (||2017-01-02||NULL||FALSE||NULL|
It's better to simply do a join in your SQL query, for example:
SELECT *, account.name AS account_name FROM lead JOIN account ON lead.account_id = account.id
There's no more Salesforce lookup-field-magic; anyone reading this query can see where
account_name comes from.
For DOUBLE data type values (for example,
currency_field_typecolumn data), we use the precision and scale from the
SObject DescribeAPI call to sync these values to the destination. You can increase the precision of the DOUBLE data type values from the Salesforce dashboard.
We use the
SObject Get DeletedAPI endpoint to capture deletes. According to Salesforce's
getDeleted()documentation, if a Salesforce admin purges the Recycle Bin before our
getDeleted()call is executed, we should receive an
INVALID_REPLICATION_DATEerror. However, we have observed in our internal tests that when we query the
SObject Get DeletedAPI endpoint after purging data from the Recycle Bin, the Salesforce API does not return the purged records and the
Important: If a Salesforce admin purges the Recycle Bin to permanently remove deleted items before the 15-day period, it can cause data integrity issues in your destination.
Salesforce tracks the history of an object when you configure History Tracking for the object. Salesforce's History tables track the field history of objects. The changes made to an object over time are recorded in the history tables. See Salesforce's Field History Tracking documentation for more information.
Important: We have disabled History mode for native Salesforce history objects associated with standard objects. The standard objects follow the naming pattern
_StandardObjectName_Historyand are listed in Salesforce's StandardObjectNameHistory documentation.
We do not capture hard deletes for History tables but use the
is_deleted field from Salesforce to capture deletes.