Firebase Beta
Firebase is an application development platform. Firebase Cloud Firestore is a NoSQL document database characterized by a lack of a fixed schema. Data is stored in key-value pairs in documents that form a collection.
Supported services
Fivetran supports the Firebase Cloud Firestore database.
NOTE: We support only the Cloud Firestore databases in Native mode.
Supported configurations
Fivetran supports the following Firebase configurations:
Supportability Category | Supported Values |
---|---|
Connector limit per database | No limit |
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | check | |
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API | check |
Setup guide
For specific instructions on how to set up your Firebase connector, see the Cloud Firestore setup guide.
Sync overview
Once Fivetran is connected to your Firestore database, we pull a complete dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In each sync, we pull all the data from the source and find the difference between the syncs to get the updated data.
Pack mode options
Pack mode determine the form in which Fivetran delivers your data. There are two pack modes - packed and unpacked.
NOTE: In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "
bar
(INTEGER)" means the column name isbar
and it stores INTEGER data.
Unpacked mode
Fivetran unpacks one layer of nested fields and infers types.
In unpacked mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id (STRING) | bar (INTEGER) | nested (JSON) |
---|---|---|
"foo" | 2 | {"baz":3} |
Packed mode
In packed mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id (STRING) | data (JSON) |
---|---|
"foo" | {"_id":"foo", "bar":2, nested":{"baz":3}} |
Switching pack modes
You can switch the pack mode for your connector at any time in your Fivetran dashboard. When you change the pack mode for a table, we automatically perform a full connector re-sync.
To change the pack mode for your connector, do the following:
- Go to the Setup tab in the connector dashboard.
- Click Edit connection details.
- In the connector setup form, change the Pack Mode.
- Click Save & Test.
Replication speeds
We pull all the data from source in each sync, so there might be some delay to fetch and process this data.
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the destination. The ability to sync changes quickly also depends on your configured sync frequency. We recommend setting up a higher sync frequency or frequency close to your average sync speed for data sources with a high rate of data changes.
Schema information
Fivetran tries replicating the exact schema and tables from your Firestore database to your destination.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks deleted rows in the source database._fivetran_synced
(UTC TIMESTAMP) indicates when Fivetran successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformations and mapping
As we extract your data, we match Firestore data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system fails when we encounter columns with data types that we don't accept or transform.
The following table illustrates how we transform your Firestore data types into Fivetran supported types:
Firestore Data Type | Fivetran Data Type | Fivetran Supported |
---|---|---|
Array | JSON | True |
Boolean | BOOLEAN | True |
Date and time | INSTANT | True |
Floating-point number | DOUBLE | True |
Geographical point | STRING | True |
Integer | LONG | True |
Map | JSON | True |
Null | NULL | True |
Reference | STRING | True |
Text string | STRING | True |
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.
Excluding source data
If you don’t want to sync all the data from your primary database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connector details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can change the permissions to restrict access to particular collections or sub-collections using Firebase Security Rules.
Initial sync
When Fivetran connects to a new Firestore database, we first copy all the data from every collection (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We perform the db.collection(collection).get()
and db.collectionGroup(subcollection)
operations to fetch the collection and subcollection data from the source, respectively. We do not pull entire data, we paginate through the result to make sync failure tolerant.
Updating data
Fivetran pulls all the data from your source in each sync, calculates the difference between the current and previous sync to identify updates and deletes, and syncs them accordingly.
Fivetran maintains collections and subcollections separately.
For collections, we map Firestore's built-in document_id
(custom or auto-generated) column as the _id
column and use it as the primary key for each table. For subcollections, we use the _path
column, which contains a unique path for every subcollection as the primary key. For example, collection/collection_id/sub-collection/sub-collection_id
.
The primary key field is used to identify rows to merge the changes in your documents into the corresponding tables in the destination as follows:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Fivetran Teleport Sync
Fivetran Teleport Sync is a proprietary incremental sync method that offers the completeness of snapshots, which is a substitute of Snapshot Listener. With this sync mechanism, Fivetran can incrementally replicate your database with no additional setup other than connecting to your database.
Fivetran Teleport Sync's queries perform the following operations on your database:
- Pull all the data of each selected table
- Perform calculations on all values in each synced table's rows
- Aggregate a compressed table snapshot in the database's memory
Fivetran Teleport Sync performs following actions in each sync:
- New updates on each row
- Capture deletes
- Data type changes that do not affect existing values (For example, Number to String)
We do not support syncing more than 400 tables or tables with more than 15 million rows with Fivetran Teleport Sync for Firebase.
Deleted rows
We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted
column value of the corresponding row in the destination to TRUE
.
Subcollections
We sync nested collections or subcollections present up to Level 1 (level 1
). We do not sync any nested data Level 2 (level 2
) onwards.
collection:(level 0)
document:
Id:1
name:foo
nested_collection:(level 1)
nested_document:
Id:2
name:nested_foo
nested_collection_2:(level 2)
nested_document_2:
Id:nested_2
name:nested_level_2_foo
To sync subcollections, we follow a parent-child table approach.
In the destination, we maintain a separate table for each uniquely named subcollection, ensuring a one-to-one relationship between the source and destination. If two or more subcollections have the same name, they are stored in a single table even if they belong to different parent collections.
Example:
The following source data
Collection | Collection Id | Document | Subcollection | Subcollection Id | |
---|---|---|---|---|---|
Rooms | Room A | Name: “chat room” | Messages | M1 | From: “alex” Msg: “Hello world” |
Room B | Name: "Study room" | Messages | M2 | From: “bob” Msg: “How are you?” | |
Room C | Name: "Living room" | Furniture | F1 | brand: "eco_fun" size: "king" |
is stored as follows in the destination:
Messages
_path | data |
---|---|
Rooms/Room A/Messages/M1 | {From: “alex” Msg: “Hello world”} |
Rooms/Room B/Messages/M2 | {From: “bob” Msg: “How are you?”} |
Furniture
_path | data |
---|---|
Rooms/Room C/Furniture/F1 | {brand: "eco_fun" size: "king" } |
Here, the _path
column is used to communicate with the parent collection.
Subcollections sync strategy
Fivetran syncs subcollections using collection group queries and captures updates and deletes through Teleport. To update newly added subcollections in the source, we need to make additional API calls, which increases the sync time. To optimize this operation, we sync once every seven days to update all the newly added subcollections, after which they are updated in the destination.
As a result, you can experience a delay of a maximum of seven days for your most recently added subcollections to be updated in the destination. To sync your most recently added subcollections faster, you must start a re-sync for all their parent collections.
Once the sync is complete, all the subcollection tables Fivetran created in the destination are visible in the dashboard Schema
tab. Subcollection names begin with a forward slash ("/").
Subcollection discovery
We only discover and sync subcollections that are referenced by documents that exist in a collection we are already syncing. If your subcollection is not referenced by an existing document, we don't sync it. This can happen in at least two scenarios:
- You delete a document, but not the subcollection it references.
- You create a subcollection path referencing a document that does not exist in a collection.
You must create at least one document in a collection we are syncing that references your subcollection.