Firebase Beta

Firebase is an application development platform. Firebase Cloud Firestore is a NoSQL document database characterized by a lack of a fixed schema. Data is stored in key-value pairs in documents that form a collection.

Supported services

Fivetran supports the Firebase Cloud Firestore database.

We support only the Cloud Firestore databases in Native mode.

Supported configurations

Fivetran supports the following Firebase configurations:

Supportability Category	Supported Values
Connection limit per database	No limit

Features

Feature Name	Supported	Notes
Capture deletes		All tables and fields
History mode
Custom data		All tables and fields
Data blocking		Column level, table level, and schema level
Column hashing
Re-sync		Table level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking
Authorization via API

Setup guide

For specific instructions on how to set up your Firebase connection, see the Cloud Firestore setup guide.

Sync overview

Once Fivetran is connected to your Firestore database, we pull a complete dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In each sync, we pull all the data from the source and find the difference between the syncs to get the updated data.

Pack mode options

Pack mode determine the form in which Fivetran delivers your data. There are two pack modes - packed and unpacked.

Subcollections are always delivered in packed mode.

In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "bar (INTEGER)" means the column name is bar and it stores INTEGER data.

Unpacked mode

Fivetran unpacks one layer of nested fields and infers types.

In unpacked mode, the following source table

{
 "_id": "foo", <== document_id
 "bar": 2,
 "nested": {
   "baz": 3
 }
}

is delivered to your destination as

_id (STRING)	bar (INTEGER)	nested (JSON)
"foo"	2	`{"baz":3}`

Packed mode

In packed mode, the following source table

{
 "_id": "foo", <== document_id
 "bar": 2,
 "nested": {
   "baz": 3
 }
}

is delivered to your destination as

_id (STRING)	data (JSON)
"foo"	`{"_id":"foo", "bar":2, nested":{"baz":3}}`

Switching pack modes

You can switch the pack mode for your connection at any time in your Fivetran dashboard.

We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.

To change the pack mode for your connection, do the following:

Go to the Setup tab in the connection dashboard.
Click Edit connection details.
In the connection setup form, change the Pack Mode.
Click Save & Test.

Replication speeds

We pull all the data from source in each sync, so there might be some delay to fetch and process this data.

Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the destination. The ability to sync changes quickly also depends on your configured sync frequency. We recommend setting up a higher sync frequency or frequency close to your average sync speed for data sources with a high rate of data changes.

Schema information

Fivetran tries replicating the exact schema and tables from your Firestore database to your destination.

Fivetran-generated columns

Fivetran adds the following columns to every table in your destination:

_fivetran_deleted (BOOLEAN) marks deleted rows in the source database.
_fivetran_synced (UTC TIMESTAMP) indicates when Fivetran successfully synced the row.

We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.

Type transformations and mapping

As we extract your data, we match Firestore data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system fails when we encounter columns with data types that we don't accept or transform.

The following table illustrates how we transform your Firestore data types into Fivetran supported types:

Firestore Data Type	Fivetran Data Type	Fivetran Supported
Array	JSON	True
Boolean	BOOLEAN	True
Date and time	INSTANT	True
Floating-point number	DOUBLE	True
Geographical point	STRING	True
Integer	LONG	True
Map	JSON	True
Null	NULL	True
Reference	STRING	True
Text string	STRING	True

In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.

Excluding source data

If you don’t want to sync all the data from your primary database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.

Alternatively, you can change the permissions to restrict access to particular collections or sub-collections using Firebase Security Rules.

Initial sync

When Fivetran connects to a new Firestore database, we first copy all the data from every collection (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We perform the db.collection(collection).get() and db.collectionGroup(subcollection) operations to fetch the collection and subcollection data from the source, respectively. We do not pull entire data, we paginate through the result to make sync failure tolerant.

Updating data

Fivetran pulls all the data from your source in each sync, calculates the difference between the current and previous sync to identify updates and deletes, and syncs them accordingly.

Fivetran maintains collections and subcollections separately.

For collections, we map Firestore's built-in document_id (custom or auto-generated) column as the _id column and use it as the primary key for each table. For subcollections, we use the _path column, which contains a unique path for every subcollection as the primary key. For example, collection/collection_id/sub-collection/sub-collection_id.

The primary key field is used to identify rows to merge the changes in your documents into the corresponding tables in the destination as follows:

Every inserted row in the source generates a new row in the destination with _fivetran_deleted = FALSE.
Every updated row in the source updates the data in the corresponding row in the destination, with _fivetran_deleted = FALSE.
For every deleted row, the _fivetran_deleted column value is set to TRUE for the corresponding row in the destination.

Fivetran Teleport Sync

Fivetran Teleport Sync is a proprietary incremental sync method that offers the completeness of snapshots, which is a substitute of Snapshot Listener. With this sync mechanism, Fivetran can incrementally replicate your database with no additional setup other than connecting to your database.

Fivetran Teleport Sync's queries perform the following operations on your database:

Pull all the data of each selected table
Perform calculations on all values in each synced table's rows
Aggregate a compressed table snapshot in the database's memory

Fivetran Teleport Sync performs following actions in each sync:

New updates on each row
Capture deletes
Data type changes that do not affect existing values (For example, Number to String)

We do not support syncing more than 400 tables or tables with more than 15 million rows with Fivetran Teleport Sync for Firebase.

Deleted rows

We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted column value of the corresponding row in the destination to TRUE.

Subcollections

Subcollections are always delivered in packed mode.

We sync nested collections or subcollections present up to Level 1 (level 1). We do not sync any nested data Level 2 (level 2) onwards.

collection:(level 0)
    document:
        Id:1
        name:foo
        nested_collection:(level 1)
            nested_document:
                Id:2
                name:nested_foo
                nested_collection_2:(level 2)
                    nested_document_2:
                        Id:nested_2
                        name:nested_level_2_foo

To sync subcollections, we follow a parent-child table approach.

In the destination, we maintain a separate table for each uniquely named subcollection, ensuring a one-to-one relationship between the source and destination. If two or more subcollections have the same name, they are stored in a single table even if they belong to different parent collections.

Example:

The following source data

Collection	Collection Id	Document	Subcollection	Subcollection Id
Rooms	Room A	Name: “chat room”	Messages	M1	From: “alex” Msg: “Hello world”
	Room B	Name: "Study room"	Messages	M2	From: “bob” Msg: “How are you?”
	Room C	Name: "Living room"	Furniture	F1	brand: "eco_fun" size: "king"

is stored as follows in the destination:

Messages

_path	data
Rooms/Room A/Messages/M1	{From: “alex” Msg: “Hello world”}
Rooms/Room B/Messages/M2	{From: “bob” Msg: “How are you?”}

Furniture

_path	data
Rooms/Room C/Furniture/F1	{brand: "eco_fun" size: "king" }

Here, the _path column is used to communicate with the parent collection.

Subcollections sync strategy

Fivetran syncs subcollections using collection group queries and captures updates and deletes through Teleport. To update newly added subcollections in the source, we need to make additional API calls, which increases the sync time. To optimize this operation, we sync once every seven days to update all the newly added subcollections, after which they are updated in the destination.

As a result, you can experience a delay of a maximum of seven days for your most recently added subcollections to be updated in the destination. To sync your most recently added subcollections faster, you must start a re-sync for all their parent collections.

Once the sync is complete, all the subcollection tables Fivetran created in the destination are visible in the dashboard Schema tab. Subcollection names begin with a forward slash ("/").

Subcollection discovery

We only discover and sync subcollections that are referenced by documents that exist in a collection we are already syncing. If your subcollection is not referenced by an existing document, we don't sync it. This can happen in at least two scenarios:

You delete a document, but not the subcollection it references.
You create a subcollection path referencing a document that does not exist in a collection.

You must create at least one document in a collection we are syncing that references your subcollection.