Pagination
Given a page size and a pagination strategy, the DefaultPaginator
will point to pages of results for as long as its strategy returns a next_page_token
.
Iterating over pages of result is different from iterating over stream slices. Stream slices have semantic value, for instance, a Datetime stream slice defines data for a specific date range. Two stream slices will have data for different date ranges. Conversely, pages don't have semantic value. More pages simply means that more records are to be read, without specifying any meaningful difference between the records of the first and later pages.
Schema:
Paginator:
type: object
anyOf:
- "$ref": "#/definitions/DefaultPaginator"
- "$ref": "#/definitions/NoPagination"
NoPagination:
type: object
additionalProperties: true
Default paginator
The default paginator is defined by
page_size_option
: How to specify the page size in the outgoing HTTP requestpagination_strategy
: How to compute the next page to fetchpage_token_option
: How to specify the next page to fetch in the outgoing HTTP request
Schema:
DefaultPaginator:
type: object
additionalProperties: true
required:
- page_token_option
- pagination_strategy
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
page_size_option:
"$ref": "#/definitions/RequestOption"
page_token_option:
anyOf:
- "$ref": "#/definitions/RequestOption"
- "$ref": "#/definitions/RequestPath"
pagination_strategy:
"$ref": "#/definitions/PaginationStrategy"
3 pagination strategies are supported
- Page increment
- Offset increment
- Cursor-based
Pagination Strategies
Schema:
PaginationStrategy:
type: object
anyOf:
- "$ref": "#/definitions/CursorPagination"
- "$ref": "#/definitions/OffsetIncrement"
- "$ref": "#/definitions/PageIncrement"
Page increment
When using the PageIncrement
strategy, the page number will be set as part of the page_token_option
.
Schema:
PageIncrement:
type: object
additionalProperties: true
required:
- page_size
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
The following paginator example will fetch 5 records per page, and specify the page number as a request_parameter:
Example:
paginator:
type: "DefaultPaginator"
page_size_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page_size"
pagination_strategy:
type: "PageIncrement"
page_size: 5
page_token_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page"
If the page contains less than 5 records, then the paginator knows there are no more pages to fetch. If the API returns more records than requested, all records will be processed.
Assuming the endpoint to fetch data from is https://cloud.airbyte.com/api/get_data
,
the first request will be sent as https://cloud.airbyte.com/api/get_data?page_size=5&page=0
and the second request as https://cloud.airbyte.com/api/get_data?page_size=5&page=1
,
Offset increment
When using the OffsetIncrement
strategy, the number of records read will be set as part of the page_token_option
.
Schema:
OffsetIncrement:
type: object
additionalProperties: true
required:
- page_size
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
page_size:
type: integer
The following paginator example will fetch 5 records per page, and specify the offset as a request_parameter:
Example:
paginator:
type: "DefaultPaginator"
page_size_option:
type: "RequestOption"
inject_into: "request_parameter"
field_name: "page_size"
pagination_strategy:
type: "OffsetIncrement"
page_size: 5
page_token_option:
type: "RequestOption"
field_name: "offset"
inject_into: "request_parameter"
Assuming the endpoint to fetch data from is https://cloud.airbyte.com/api/get_data
,
the first request will be sent as https://cloud.airbyte.com/api/get_data?page_size=5&offset=0
and the second request as https://cloud.airbyte.com/api/get_data?page_size=5&offset=5
,
Cursor
The CursorPagination
outputs a token by evaluating its cursor_value
string with the following parameters:
response
: The decoded responseheaders
: HTTP headers on the responselast_records
: List of records selected from the last response
This cursor value can be used to request the next page of record.
Schema:
CursorPagination:
type: object
additionalProperties: true
required:
- cursor_value
properties:
"$parameters":
"$ref": "#/definitions/$parameters"
cursor_value:
type: string
stop_condition:
type: string
page_size:
type: integer
Cursor paginator in request parameters
In this example, the next page of record is defined by setting the from
request parameter to the id of the last record read:
paginator:
type: "DefaultPaginator"
<...>
pagination_strategy:
type: "CursorPagination"
cursor_value: "{{ last_records[-1]['id'] }}"
page_token_option:
type: "RequestPath"
field_name: "from"
inject_into: "request_parameter"
Assuming the endpoint to fetch data from is https://cloud.airbyte.com/api/get_data
,
the first request will be sent as https://cloud.airbyte.com/api/get_data
.
Assuming the id of the last record fetched is 1000,
the next request will be sent as https://cloud.airbyte.com/api/get_data?from=1000
.
Cursor paginator in path
Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers:
paginator:
type: "DefaultPaginator"
<...>
pagination_strategy:
type: "CursorPagination"
cursor_value: "{{ headers['link']['next']['url'] }}"
page_token_option:
type: "RequestPath"
Assuming the endpoint to fetch data from is https://cloud.airbyte.com/api/get_data
,
the first request will be sent as https://cloud.airbyte.com/api/get_data
Assuming the response's next url is https://cloud.airbyte.com/api/get_data?page=1&page_size=100
,
the next request will be sent as https://cloud.airbyte.com/api/get_data?page=1&page_size=100