Firebase Betalink
Updated 10 days ago
Firebase is an application development platform. Firebase Cloud Firestore is a NoSQL document database characterized by a lack of a fixed schema. Data is stored in key-value pairs in documents that form a collection.
Supported serviceslink
Fivetran supports the Firebase Cloud Firestore database.
NOTE: We support only the Cloud Firestore databases in Native mode.
Supported configurationslink
Fivetran supports the following Firebase configurations:
Supportability Category | Supported Values |
---|---|
Connector limit per database | No limit |
Featureslink
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | All tables and fields |
Custom data | check | All tables and fields |
Data blocking | check | Column level, table level, and schema level |
Column hashing | check | |
Re-sync | check | Table level |
History | check | Supports history mode. |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking |
Setup guidelink
For specific instructions on how to set up your Firebase connector, see the Cloud Firestore setup guide.
Sync overviewlink
Once Fivetran is connected to your Firestore database, we pull a complete dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In each sync, we pull all the data from the source and find the difference between the syncs to get the updated data.
Pack mode optionslink
Pack mode determine the form in which Fivetran delivers your data. There are two pack modes - packed and unpacked.
Unpacked modelink
Fivetran unpacks one layer of nested fields and infers types.
In unpacked mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
content_copy
is delivered to your destination as
_id STRING | bar INTEGER | nested JSON |
---|---|---|
"foo" | 2 | {"baz":3} |
Packed modelink
In packed mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
content_copy
is delivered to your destination as
_id STRING | data JSON |
---|---|
"foo" | {"_id":"foo", "bar":2, nested":{"baz":3}} |
Switching pack modeslink
You can switch the pack mode for your connector at any time in your Fivetran dashboard. When you change the pack mode for a table, we automatically perform a full connector re-sync.
To change the pack mode for your connector, do the following:
- Go to the Setup tab in the connector dashboard.
- Click Edit connection details.
- In the connector setup form, change the Pack Mode.
- Click Save & Test.
Replication speedslink
We pull all the data from source in each sync, so there might be some delay to fetch and process this data.
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the destination. The ability to sync changes quickly also depends on your configured sync frequency. We recommend setting up a higher sync frequency or frequency close to your average sync speed for data sources with a high rate of data changes.
Schema informationlink
Fivetran tries replicating the exact schema and tables from your Firestore database to your destination.
Fivetran-generated columnslink
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks deleted rows in the source database._fivetran_synced
(UTC TIMESTAMP) indicates when Fivetran successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs.
Type transformations and mappinglink
As we extract your data, we match Firestore data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system fails when we encounter columns with data types that we don't accept or transform.
The following table illustrates how we transform your Firestore data types into Fivetran supported types:
Firestore Data Type | Fivetran Data Type | Fivetran Supported |
---|---|---|
Array | JSON | True |
Boolean | BOOLEAN | True |
Date and time | INSTANT | True |
Floating-point number | DOUBLE | True |
Geographical point | STRING | True |
Integer | LONG | True |
Map | JSON | True |
Null | NULL | True |
Reference | STRING | True |
Text string | STRING | True |
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.
Excluding source datalink
If you don’t want to sync all the data from your master database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Column Blocking documentation.
Alternatively, you can change the permissions to restrict access to particular collections or sub-collections using Firebase Security Rules.
Initial synclink
When Fivetran connects to a new Firestore database, we first copy all the data from every collection (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We perform the db.collection(collection).get()
and db.collectionGroup(subcollection)
operations to fetch the collection and subcollection data from the source, respectively. We do not pull entire data, we paginate through the result to make sync failure tolerant.
Updating datalink
Fivetran pulls all the data from your source in each sync, calculates the difference between the current and previous sync to identify updates and deletes, and syncs them accordingly.
Fivetran maintains collections and subcollections separately.
For collections, we map Firestore's built-in document_id
(custom or auto-generated) column as the _id
column and use it as the primary key for each table. For subcollections, we use the _path
column, which contains a unique path for every subcollection as the primary key. For example, collection/collection_id/sub-collection/sub-collection_id
.
The primary key field is used to identify rows to merge the changes in your documents into the corresponding tables in the destination as follows:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Fivetran Teleport Synclink
Fivetran Teleport Sync is a proprietary database replication method that offers the completeness of snapshots, which is a substitute of Snapshot Listener. With this sync mechanism, Fivetran can incrementally replicate your database with no additional setup other than connecting to your database.
Fivetran Teleport Sync's queries perform the following operations on your database:
- Pull all the data of each selected table
- Perform calculations on all values in each synced table's rows
- Aggregate a compressed table snapshot in the database's memory
Fivetran Teleport Sync performs following actions in each sync:
- New updates on each row
- Capture deletes
- Data type changes that do not affect existing values (For example, Number to String)
Deleted rowslink
We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted
column value of the corresponding row in the destination to TRUE
.
Subcollectionslink
We sync nested collections or subcollections present up to Level 1 (level 1
). We do not sync any nested data Level 2 (level 2
) onwards.
collection:(level 0)
document:
Id:1
name:foo
nested_collection:(level 1)
nested_document:
Id:2
name:nested_foo
nested_collection_2:(level 2)
nested_document_2:
Id:nested_2
name:nested_level_2_foo
content_copy
To sync subcollections, we follow a parent-child table approach.
In the destination, we maintain a separate table for each uniquely named subcollection, ensuring a one-to-one relationship between the source and destination. If two or more subcollections have the same name, they are stored in a single table even if they belong to different parent collections.
Example:
The following source data
Collection | Collection Id | Document | Subcollection | Subcollection Id | |
---|---|---|---|---|---|
Rooms | Room A | Name: “chat room” | Messages | M1 | From: “alex” Msg: “Hello world” |
Room B | Name: "Study room" | Messages | M2 | From: “bob” Msg: “How are you?” | |
Room C | Name: "Living room" | Furniture | F1 | brand: "eco_fun" size: "king" |
is stored as follows in the destination:
Messages
_path | data |
---|---|
Rooms/Room A/Messages/M1 | {From: “alex” Msg: “Hello world”} |
Rooms/Room B/Messages/M2 | {From: “bob” Msg: “How are you?”} |
Furniture
_path | data |
---|---|
Rooms/Room C/Furniture/F1 | {brand: "eco_fun" size: "king" } |
Here, the _path
column is used to communicate with the parent collection.
Subcollections sync strategylink
Fivetran syncs sub-collections using collection group queries and captures updates and deletes through Teleport. To update newly added subcollections in the source, we need to make additional API calls, which increases the sync time. To optimize this operation, we sync once every seven days to update all the newly added subcollections, after which they are updated in the destination.
As a result, you can experience a delay of a maximum of seven days for your most recently added subcollections to be updated in the destination. To sync your most recently added subcollections faster, you must start a re-sync for all their parent collections.
Once the sync is complete, all the sub-collection tables Fivetran created in the destination are visible in the dashboard Schema
tab.