Pagination
An efficient data connector fetches data in batches instead of retrieving everything in one request. The most common way to do this is through pagination, where the source returns data in smaller chunks called pages and your connector processes these pages one at a time. This reduces the risk of timeouts and helps you manage API rate limits and system memory limits.
Each source exposes its data differently. The pagination method available to your connector depends on what the source supports. For APIs, check the source documentation to understand how it paginates. For databases, examine the table schema and available indexed columns.
In your Connector SDK project, design your data fetching logic to use pagination — implement a loop that requests, processes, and sends pages until all data is synced.
How to choose a pagination strategy
The strategy you choose depends on what the source exposes and how much control it gives you over page size. Page size is a critical tuning parameter. Smaller pages reduce the risk of timeouts and data loss if a sync is interrupted, while larger pages can sync data faster.
Some strategies let you define the page size directly. For example, when paginating by an indexed monotonic column, you control the range of values each page covers. Others, such as APIs that return a next-page token in the response, set the page size themselves with limited room for adjustment.
The following table helps you identify the right strategy based on what your source supports.
| Your source or scenario | Source type | Strategy to use |
|---|---|---|
Includes offset, total, and limit fields. | REST API | Offset-based |
Includes page, total_pages, and per_page fields. | REST API | Page-number |
Includes a next_page_url field. | REST API | Next-URL |
Includes an opaque field such as scroll_param, cursor, or token. | REST API | Scroll token |
Includes an indexed monotonic column such as id or updated_at, and supports filtering by that column. | REST API | Keyset |
Supports filtering by date or time range parameters such as start_date and end_date. | REST API | Time-window |
Includes an indexed monotonic column such as id or updated_at. | Database | Keyset (database) |
| Too large to fetch in a single query. | Database | Server-side cursor |
You can also combine these strategies. For example, you can use time-window pagination for a large historical sync and offset pagination to chunk results within each window. See the Marketstack connector example in the Connector SDK repository.
Pagination and state management
You can use pagination with state management to make syncs incremental and resumable, which makes your connector resilient to transient errors such as network failures. After processing each page, you can save the cursor position in state. If a sync is interrupted, your connector can read the saved cursor value on the next sync and continue from that point instead of starting over, reducing the amount of data it needs to re-sync. The appropriate cursor format depends on the pagination strategy and may be an offset, page number, next-page token, or latest record timestamp.
We strongly recommend saving the connector state using checkpointing when your data is paginated. The natural breaks that occur after every page or every few pages are ideal checkpointing points.
Paginating multiple tables
When your connector fetches data from multiple tables or endpoints, paginate each dataset independently, with its own loop and its own cursor tracking progress through pages. Mixing cursors across datasets causes pagination interference and data integrity issues.
Store each dataset's cursor as a separate key-value pair in state. A good time to save state is after each dataset's pagination loop completes.
For a working implementation, See the multiple tables with cursors example in the Connector SDK repository.
Important considerations
- Initial sync vs. incremental sync: Your pagination logic needs to handle two distinct cases. On the first sync, state is an empty dictionary and your connector must paginate through the entire dataset, potentially thousands of pages. On subsequent syncs, the state dictionary should contain a cursor with a value that indicates where the sync has reached in the dataset. Your connector should use that value to fetch only records that changed since that value, which typically means far fewer pages.
- Deleted records: When using incremental pagination strategies, its important to understand how the source treats deleted records. Records deleted from the source may not be captured unless the source exposes a
deleted_atfield or a deletion event feed that you can act on usingop.delete(). If your source doesn't provide information about deletions, or your connector doesn't handle that information, records deleted in the source remain in your destination. - Rate limiting: Paginating through large datasets means many sequential API requests, making HTTP 429 rate limit responses common. Your code should handle them gracefully by respecting
Retry-Afterheaders and adding delays and back-off algorithms where needed. - Page size: A page size that is too small means more requests and slower syncs, while too large risks timeouts and memory issues. Start with the maximum page size the source supports and adjust based on observed behavior.
Common pagination patterns for API sources
APIs typically provide data in pages. For example, 100 records per request or with a next_page token to help manage large datasets safely and efficiently. If the API response is paginated, you must handle pagination in your connector. Otherwise, your connector will only sync the first page and miss the rest. Check the API documentation to understand which mechanism it uses.
The following is a high-level overview of the common patterns along with links to code implementation examples.
Offset-based pagination
You send two integers on every call such as offset and limit. The server returns exactly that slice of data. If the API supports a sort parameter, request results in a consistent order to avoid gaps or duplicates. Increment offset by limit after each page, and persist the current offset in state.
If rows are inserted or deleted while paging, later offsets can shift, causing gaps or duplicates. Mitigate this with strict sorting. If your source supports it, filter by the latest cursor value instead (see keyset pagination).
See the offset-based pagination example in the Connector SDK repository.
Page-number pagination
You send a page number starting at 1 and a per_page size. The server returns that page and typically indicates whether more pages exist through a total_pages count or a has_more flag. Increment page after each response, stop when the response signals no more pages or the items list is empty, and persist the current page in state.
See the page-number pagination example in the Connector SDK repository.
Next-URL pagination
The server returns a fully-formed URL in each response pointing to the next page. Start with your initial API endpoint URL, follow the server-supplied next URL from each response to get the following page, and stop when next is null or absent. Persist the next URL in state after each page. If the sync is interrupted, resume from the stored URL instead of restarting from the initial endpoint. Detect a repeating next value to avoid infinite loops.
Some APIs return empty pages with a non-null next. Always check whether the items list is empty in addition to checking next.
See the next-URL pagination example in the Connector SDK repository.
Scroll token pagination
The server returns an opaque token (cursor, scroll_id, nextCursor, and so on) that represents your position. Pass it back on the next request as specified by the API, typically as a query parameter or request body field; stop when the server returns no token, and persist the token in state after each page.
Scroll tokens are often session-scoped and expire after inactivity. If the source uses expiring tokens, identify a secondary cursor you can fall back on and store it in state along with the scroll token. Check the source documentation for an appropriate secondary cursor field.
See the scroll token pagination example in the Connector SDK repository.
Keyset pagination
Keyset pagination uses an indexed column value to set the start of each page and requires the API to sort results by that column. The column must be a monotonic field like id or updated_at so you can reliably use the last record's value to fetch the next page. Check the source documentation to confirm the API supports sorting on that field before using this pattern.
Each page has a start and end based on the monotonic field. Use parameters such as after_id or updated_after to set the page start, then use the last record in the page to set the start of the next page. Store the last record's field value in state after each page.
When multiple records share the same timestamp, identify and store a second value in state to mark exactly where you left off so you can efficiently pick back up.
For example:
- If the API sorts by
idonly:GET /orders?after_id=12345&limit=500 - If the API sorts by timestamp only:
GET /orders?updated_after=2025-01-01T00:00:00Z&limit=1000 - If the API supports both for tie-breaking:
GET /orders?updated_after=2025-01-01T00:00:00Z&after_id=12345&limit=500
See the keyset pagination example in the Connector SDK repository.
Time-window pagination
Break the full data range into pages based on a time window, for example, one day or one week, defined by a start and end timestamp or a start timestamp and duration. Store the start timestamp as the cursor in state and advance it after each window completes. Use this approach when the API supports date range filters or when performing a large initial historical sync spanning months or years.
If a time window contains more records than a single request can return, combine time-window with offset or keyset pagination to page through the records within each window.
See the time-window cursor example in the Connector SDK repository.
Common pagination patterns for database sources
Database-side pagination breaks a large SELECT query into smaller pages so the connector can process rows in batches, save progress, and resume efficiently. This lets the database return the result set incrementally instead of loading everything into memory at once.
The following is a high-level overview of three common patterns, which cover almost every relational database you will encounter, along with links to code implementation examples.
LIMIT/OFFSET pagination
This pattern adds LIMIT <N> OFFSET <k> to every query. Always use a deterministic ORDER BY (for example, ORDER BY updated_at, id), increment OFFSET by LIMIT after each page, and persist the current offset in state.
If rows are inserted or deleted while paging, later offsets can shift, causing gaps or duplicates. Queries also become slower as OFFSET increases because the database must skip all preceding rows. For large or frequently updated tables, use keyset pagination instead.
See the limit/offset pagination example in the Connector SDK repository.
Keyset pagination for database
Keyset pagination filters on a monotonic key such as WHERE (updated_at, id) > (last_ts, last_id) and orders by that key. After each page, advance the boundary from the last record's key, always order by the key and a tie-breaker, and store both a timestamp and tie-breaker id in state when rows can share the same timestamp.
This pattern requires an indexed monotonic column. If many rows share the same timestamp, its very common to include a tie-breaker id in both the WHERE clause and ORDER BY.
See the keyset pagination for database example in the Connector SDK repository.
Server-side cursor
When you use a server-side cursor, it opens a named cursor on the server and repeatedly fetches pages. The database delivers results in pages without rewriting the query, keeping client memory usage low.
When to use: Large sequential table scans in databases that support server-side cursors, such as PostgreSQL, where loading the full result set into client memory would be problematic.
Named server-side cursors in psycopg2 require autocommit = False on the connection, which is the default. You cannot resume the same server-side cursor after a crash or connection drop, so checkpoint a logical cursor such as last_pk or last_updated every few thousand rows. On the next sync, re-open a new server-side cursor starting from that position.
See the server-side cursors example in the Connector SDK repository.
Real-world pagination examples
You can find real-world pagination implementations in the Fivetran Connector SDK repository. These examples demonstrate various patterns in action:
Offset-based pagination — See the ArangoDB community connector. It paginates across multiple collections using skip/limit and checkpoints per batch.
Page-number pagination — See the Checkly community connector. It uses
limit=100&page={page}across multiple API endpoints with per-page checkpointing.Database-side chunking (fetch-many) — See the Apache Hive examples, using_pyhive and using_sqlalchemy. These read large result sets in batches using
fetchmany()alongside a timestamp cursor.
To find more examples, browse the community connectors list and search for keywords like limit, offset, next, cursor, and checkpoint.