Airbyte Declarative Connectors Specification (manifest.yaml)
Airbyte Specification for custom connectors
| Type | object |
|---|---|
| File match |
source-*-manifest.yaml
destination-*-manifest.yaml
**/source-*/manifest.yaml
**/destination-*/manifest.yaml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/airbyte-declarative-connectors-specification-manifest-yaml/latest.json |
| Source | https://raw.githubusercontent.com/airbytehq/airbyte-python-cdk/49c5a482de7bdfbaa3a68373a940b90c0690a56f/airbyte_cdk/sources/declarative/generated/declarative_component_schema.json |
Validate with Lintel
npx @lintel/lintel check
An API source that extracts data according to its declarative components.
Properties
The version of the Airbyte CDK used to build and test the source.
The stream schemas representing the shape of the data emitted by the stream.
A source specification made up of connector metadata and how it can be configured.
5 nested properties
A connection specification describing how a the connector can be configured.
URL of the connector's documentation page.
Additional and optional specification object to describe what an 'advanced' Auth flow would need to function.
- A connector should be able to fully function with the configuration as described by the ConnectorSpecification in a 'basic' mode.
- The 'advanced' mode provides easier UX for the user with UI improvements and automations. However, this requires further setup on the server side by instance or workspace admins beforehand. The trade-off is that the user does not have to provide as many technical inputs anymore and the auth process is faster and easier to complete.
4 nested properties
The type of auth to use
JSON path to a field in the connectorSpecification that should exist for the advanced auth to be applicable.
Value of the predicate_key fields for the advanced auth to be applicable.
Specification describing how an 'advanced' Auth flow would need to function.
5 nested properties
OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth. Must be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification using special annotation 'path_in_connector_config'. These are input values the user is entering through the UI to authenticate to the connector, that might also shared as inputs for syncing data via the connector. Examples: if no connector values is shared during oauth flow, oauth_user_input_from_connector_config_specification=[] if connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['app_id'] } } if connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['info', 'app_id'] } }
The DeclarativeOAuth specific blob. Pertains to the fields defined by the connector relating to the OAuth flow.
Interpolation capabilities:
-
The variables placeholders are declared as
{{my_var}}. -
The nested resolution variables like
{{ {{my_nested_var}} }}is allowed as well. -
The allowed interpolation context is:
- base64Encoder - encode to
base64, {{ {{my_var_a}}:{{my_var_b}} | base64Encoder }} - base64Decorer - decode from
base64encoded string, {{ {{my_string_variable_or_string_value}} | base64Decoder }} - urlEncoder - encode the input string to URL-like format, {{ https://test.host.com/endpoint | urlEncoder}}
- urlDecorer - decode the input url-encoded string into text format, {{ urlDecoder:https%3A%2F%2Fairbyte.io | urlDecoder}}
- codeChallengeS256 - get the
codeChallengeencoded value to provide additional data-provider specific authorisation values, {{ {{state_value}} | codeChallengeS256 }}
- base64Encoder - encode to
Examples:
- The TikTok Marketing DeclarativeOAuth spec: { "oauth_connector_input_specification": { "type": "object", "additionalProperties": false, "properties": { "consent_url": "https://ads.tiktok.com/marketing_api/auth?{{client_id_key}}={{client_id_value}}&{{redirect_uri_key}}={{ {{redirect_uri_value}} | urlEncoder}}&{{state_key}}={{state_value}}", "access_token_url": "https://business-api.tiktok.com/open_api/v1.3/oauth2/access_token/", "access_token_params": { "{{ auth_code_key }}": "{{ auth_code_value }}", "{{ client_id_key }}": "{{ client_id_value }}", "{{ client_secret_key }}": "{{ client_secret_value }}" }, "access_token_headers": { "Content-Type": "application/json", "Accept": "application/json" }, "extract_output": ["data.access_token"], "client_id_key": "app_id", "client_secret_key": "secret", "auth_code_key": "auth_code" } } }
OAuth specific blob. This is a Json Schema used to validate Json configurations produced by the OAuth flows as they are
returned by the distant OAuth APIs.
Must be a valid JSON describing the fields to merge back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_output_specification={
refresh_token: {
type: string,
path_in_connector_config: ['credentials', 'refresh_token']
}
}
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations. Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the server when completing an OAuth flow (typically exchanging an auth code for refresh token). Examples: complete_oauth_server_input_specification={ client_id: { type: string }, client_secret: { type: string } }
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations that
also need to be merged back into the connector configuration at runtime.
This is a subset configuration of complete_oauth_server_input_specification that filters fields out to retain only the ones that
are necessary for the connector to function with OAuth. (some fields could be used during oauth flows but not needed afterwards, therefore
they would be listed in the complete_oauth_server_input_specification but not complete_oauth_server_output_specification)
Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the
connector when using OAuth flow APIs.
These fields are to be merged back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_server_output_specification={
client_id: {
type: string,
path_in_connector_config: ['credentials', 'client_id']
},
client_secret: {
type: string,
path_in_connector_config: ['credentials', 'client_secret']
}
}
4 nested properties
The discrete migrations that will be applied on the incoming config. Each migration will be applied in the order they are defined.
[]
The list of transformations that will be applied on the incoming config at the start of each sync. The transformations will be applied in the order they are defined.
[]
The list of validations that will be performed on the incoming config at the start of each sync.
[]
Defines the amount of parallelization for the streams that are being synced. The factor of parallelization is how many partitions or streams are synced at the same time. For example, with a concurrency_level of 10, ten streams or partitions of data will processed at the same time. Note that a value of 1 could create deadlock if a stream has a very high number of partitions.
4 nested properties
The amount of concurrency that will applied during a sync. This value can be hardcoded or user-defined in the config if different users have varying volume thresholds in the target API.
The maximum level of concurrency that will be used during a sync. This becomes a required field when the default_concurrency derives from the config, because it serves as a safeguard against a user-defined threshold that is too high.
Defines how many requests can be made to the API in a given time frame. HTTPAPIBudget extracts the remaining call count and the reset time from HTTP response headers using the header names provided by ratelimit_remaining_header and ratelimit_reset_header. Only requests using HttpRequester are rate-limited; custom components that bypass HttpRequester are not covered by this budget.
5 nested properties
List of call rate policies that define how many calls are allowed.
The HTTP response header name that indicates when the rate limit resets.
The HTTP response header name that indicates the number of remaining allowed calls.
List of HTTP status codes that indicate a rate limit has been hit.
[
429
]
Maximum number of concurrent asynchronous jobs to run. This property is only relevant for sources/streams that support asynchronous job execution through the AsyncRetriever (e.g. a report-based stream that initiates a job, polls the job status, and then fetches the job results). This is often set by the API's maximum number of concurrent jobs on the account level. Refer to the API's documentation for this information.
For internal Airbyte use only - DO NOT modify manually. Used by consumers of declarative manifests for storing related metadata.
A description of the connector. It will be presented on the Source documentation page.
Any of
Definitions
Defines the field to add on a record.
List of strings defining the path where to add the value on the record.
Value of the new field. Use {{ record['existing_field'] }} syntax to refer to other fields in the record.
A schema type.
Transformation which adds field to an output record. The path of the added field can be nested.
List of transformations (path and corresponding value) that will be added to the record.
Fields will be added if expression is evaluated to True.
Authenticator for requests authenticated with an API token injected as an HTTP request header.
The API key to inject in the request. Fill it in the user inputs.
The name of the HTTP header that will be set to the API key. This setting is deprecated, use inject_into instead. Header and inject_into can not be defined at the same time.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Additional and optional specification object to describe what an 'advanced' Auth flow would need to function.
- A connector should be able to fully function with the configuration as described by the ConnectorSpecification in a 'basic' mode.
- The 'advanced' mode provides easier UX for the user with UI improvements and automations. However, this requires further setup on the server side by instance or workspace admins beforehand. The trade-off is that the user does not have to provide as many technical inputs anymore and the auth process is faster and easier to complete.
The type of auth to use
JSON path to a field in the connectorSpecification that should exist for the advanced auth to be applicable.
Value of the predicate_key fields for the advanced auth to be applicable.
Specification describing how an 'advanced' Auth flow would need to function.
5 nested properties
OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth. Must be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification using special annotation 'path_in_connector_config'. These are input values the user is entering through the UI to authenticate to the connector, that might also shared as inputs for syncing data via the connector. Examples: if no connector values is shared during oauth flow, oauth_user_input_from_connector_config_specification=[] if connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['app_id'] } } if connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['info', 'app_id'] } }
The DeclarativeOAuth specific blob. Pertains to the fields defined by the connector relating to the OAuth flow.
Interpolation capabilities:
-
The variables placeholders are declared as
{{my_var}}. -
The nested resolution variables like
{{ {{my_nested_var}} }}is allowed as well. -
The allowed interpolation context is:
- base64Encoder - encode to
base64, {{ {{my_var_a}}:{{my_var_b}} | base64Encoder }} - base64Decorer - decode from
base64encoded string, {{ {{my_string_variable_or_string_value}} | base64Decoder }} - urlEncoder - encode the input string to URL-like format, {{ https://test.host.com/endpoint | urlEncoder}}
- urlDecorer - decode the input url-encoded string into text format, {{ urlDecoder:https%3A%2F%2Fairbyte.io | urlDecoder}}
- codeChallengeS256 - get the
codeChallengeencoded value to provide additional data-provider specific authorisation values, {{ {{state_value}} | codeChallengeS256 }}
- base64Encoder - encode to
Examples:
- The TikTok Marketing DeclarativeOAuth spec: { "oauth_connector_input_specification": { "type": "object", "additionalProperties": false, "properties": { "consent_url": "https://ads.tiktok.com/marketing_api/auth?{{client_id_key}}={{client_id_value}}&{{redirect_uri_key}}={{ {{redirect_uri_value}} | urlEncoder}}&{{state_key}}={{state_value}}", "access_token_url": "https://business-api.tiktok.com/open_api/v1.3/oauth2/access_token/", "access_token_params": { "{{ auth_code_key }}": "{{ auth_code_value }}", "{{ client_id_key }}": "{{ client_id_value }}", "{{ client_secret_key }}": "{{ client_secret_value }}" }, "access_token_headers": { "Content-Type": "application/json", "Accept": "application/json" }, "extract_output": ["data.access_token"], "client_id_key": "app_id", "client_secret_key": "secret", "auth_code_key": "auth_code" } } }
13 nested properties
The DeclarativeOAuth Specific string URL string template to initiate the authentication. The placeholders are replaced during the processing to provide neccessary values.
The DeclarativeOAuth Specific URL templated string to obtain the access_token, refresh_token etc.
The placeholders are replaced during the processing to provide neccessary values.
The DeclarativeOAuth Specific string of the scopes needed to be grant for authenticated user.
The DeclarativeOAuth Specific optional headers to inject while exchanging the auth_code to access_token during completeOAuthFlow step.
The DeclarativeOAuth Specific optional query parameters to inject while exchanging the auth_code to access_token during completeOAuthFlow step.
When this property is provided, the query params will be encoded as Json and included in the outgoing API request.
The DeclarativeOAuth Specific list of strings to indicate which keys should be extracted and returned back to the input config.
The DeclarativeOAuth Specific object to provide the criteria of how the state query param should be constructed,
including length and complexity.
The DeclarativeOAuth Specific optional override to provide the custom client_id key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom client_secret key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom scope key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom state key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom code key name to something like auth_code or custom_auth_code, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom redirect_uri key name to something like callback_uri, if required by data-provider.
OAuth specific blob. This is a Json Schema used to validate Json configurations produced by the OAuth flows as they are
returned by the distant OAuth APIs.
Must be a valid JSON describing the fields to merge back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_output_specification={
refresh_token: {
type: string,
path_in_connector_config: ['credentials', 'refresh_token']
}
}
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations. Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the server when completing an OAuth flow (typically exchanging an auth code for refresh token). Examples: complete_oauth_server_input_specification={ client_id: { type: string }, client_secret: { type: string } }
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations that
also need to be merged back into the connector configuration at runtime.
This is a subset configuration of complete_oauth_server_input_specification that filters fields out to retain only the ones that
are necessary for the connector to function with OAuth. (some fields could be used during oauth flows but not needed afterwards, therefore
they would be listed in the complete_oauth_server_input_specification but not complete_oauth_server_output_specification)
Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the
connector when using OAuth flow APIs.
These fields are to be merged back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_server_output_specification={
client_id: {
type: string,
path_in_connector_config: ['credentials', 'client_id']
},
client_secret: {
type: string,
path_in_connector_config: ['credentials', 'client_secret']
}
}
Authenticator for requests authenticated with the Basic HTTP authentication scheme, which encodes a username and an optional password in the Authorization request header.
The username that will be combined with the password, base64 encoded and used to make requests. Fill it in the user inputs.
The password that will be combined with the username, base64 encoded and used to make requests. Fill it in the user inputs.
Authenticator for requests authenticated with a bearer token injected as a request header of the form Authorization: Bearer <token>.
Token to inject as request header for authenticating with the API.
Authenticator that selects concrete authenticator based on config property.
Path of the field in config with selected authenticator name
Authenticators to select from.
Defines the streams to try reading when running a check operation.
Names of the streams to try reading from when running a check operation.
The dynamic stream name.
The number of streams to attempt reading from during a check operation. If stream_count exceeds the total number of available streams, the minimum of the two values will be used.
(This component is experimental. Use at your own risk.) Defines the dynamic streams to try reading when running a check operation.
Numbers of the streams to try reading from when running a check operation.
Enables stream check availability. This field is automatically set by the CDK.
Error handler that sequentially iterates over a list of error handlers.
List of error handlers to iterate on to determine how to handle a failed response.
Defines the amount of parallelization for the streams that are being synced. The factor of parallelization is how many partitions or streams are synced at the same time. For example, with a concurrency_level of 10, ten streams or partitions of data will processed at the same time. Note that a value of 1 could create deadlock if a stream has a very high number of partitions.
The amount of concurrency that will applied during a sync. This value can be hardcoded or user-defined in the config if different users have varying volume thresholds in the target API.
The maximum level of concurrency that will be used during a sync. This becomes a required field when the default_concurrency derives from the config, because it serves as a safeguard against a user-defined threshold that is too high.
Streams that are only available while performing a connector operation when the condition is met.
Condition that will be evaluated to determine if a set of streams should be available.
Streams that will be used during an operation based on the condition.
Backoff strategy with a constant backoff interval.
Backoff time in seconds.
Pagination strategy that evaluates an interpolated string to define the next page to fetch.
Value of the cursor defining the next page to fetch.
The number of records to include in each pages.
Template string evaluating when to stop paginating.
Authenticator component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom authentication strategy. Has to be a sub class of DeclarativeAuthenticator. The format is source_<name>.<package>.<class_name>.
Backoff strategy component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom backoff strategy. The format is source_<name>.<package>.<class_name>.
Error handler component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom error handler. The format is source_<name>.<package>.<class_name>.
Incremental component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom incremental sync. The format is source_<name>.<package>.<class_name>.
The location of the value on a record that will be used as a bookmark during sync.
Pagination strategy component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom pagination strategy. The format is source_<name>.<package>.<class_name>.
Record extractor component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom record extraction strategy. The format is source_<name>.<package>.<class_name>.
Record filter component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom record filter strategy. The format is source_<name>.<package>.<class_name>.
Requester component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom requester strategy. The format is source_<name>.<package>.<class_name>.
Retriever component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom retriever strategy. The format is source_<name>.<package>.<class_name>.
Partition router component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom partition router. The format is source_<name>.<package>.<class_name>.
Schema Loader component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom schema loader. The format is source_<name>.<package>.<class_name>.
Schema normalization component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom normalization. The format is source_<name>.<package>.<class_name>.
Apply a custom transformation on the input state.
Fully-qualified name of the class that will be implementing the custom state migration. The format is source_<name>.<package>.<class_name>.
Transformation component whose behavior is derived from a custom code implementation of the connector.
Fully-qualified name of the class that will be implementing the custom transformation. The format is source_<name>.<package>.<class_name>.
Transforms the input state for per-partitioned streams from the legacy format to the low-code format. The cursor field and partition ID fields are automatically extracted from the stream's DatetimebasedCursor and SubstreamPartitionRouter. Example input state: { "13506132": { "last_changed": "2022-12-27T08:34:39+00:00" } Example output state: { "partition": {"id": "13506132"}, "cursor": {"last_changed": "2022-12-27T08:34:39+00:00"} }
Cursor that allows for incremental sync according to a continuously increasing integer.
The location of the value on a record that will be used as a bookmark during sync. To ensure no data loss, the API must return records in ascending order based on the cursor field. Nested fields are not supported, so the field must be at the top level of the record. You can use a combination of Add Field and Remove Field transformations to move the nested field to the top.
The value that determines the earliest record that should be synced.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Cursor to provide incremental capabilities over datetime.
The location of the value on a record that will be used as a bookmark during sync. To ensure no data loss, the API must return records in ascending order based on the cursor field. Nested fields are not supported, so the field must be at the top level of the record. You can use a combination of Add Field and Remove Field transformations to move the nested field to the top.
The datetime that determines the earliest record that should be synced.
The datetime format used to format the datetime values that are sent in outgoing requests to the API. Use placeholders starting with "%" to describe the format the API is using. The following placeholders are available:
- %s: Epoch unix timestamp -
1686218963 - %s_as_float: Epoch unix timestamp in seconds as float with microsecond precision -
1686218963.123456 - %ms: Epoch unix timestamp (milliseconds) -
1686218963123 - %a: Weekday (abbreviated) -
Sun - %A: Weekday (full) -
Sunday - %w: Weekday (decimal) -
0(Sunday),6(Saturday) - %d: Day of the month (zero-padded) -
01,02, ...,31 - %b: Month (abbreviated) -
Jan - %B: Month (full) -
January - %m: Month (zero-padded) -
01,02, ...,12 - %y: Year (without century, zero-padded) -
00,01, ...,99 - %Y: Year (with century) -
0001,0002, ...,9999 - %H: Hour (24-hour, zero-padded) -
00,01, ...,23 - %I: Hour (12-hour, zero-padded) -
01,02, ...,12 - %p: AM/PM indicator
- %M: Minute (zero-padded) -
00,01, ...,59 - %S: Second (zero-padded) -
00,01, ...,59 - %f: Microsecond (zero-padded to 6 digits) -
000000 - %_ms: Millisecond (zero-padded to 3 digits) -
000 - %z: UTC offset -
(empty),+0000,-04:00 - %Z: Time zone name -
(empty),UTC,GMT - %j: Day of the year (zero-padded) -
001,002, ...,366 - %U: Week number of the year (starting Sunday) -
00, ...,53 - %W: Week number of the year (starting Monday) -
00, ...,53 - %c: Date and time -
Tue Aug 16 21:30:00 1988 - %x: Date standard format -
08/16/1988 - %X: Time standard format -
21:30:00 - %%: Literal '%' character
Some placeholders depend on the locale of the underlying system - in most cases this locale is configured as en/US. For more information see the Python documentation.
This option is used to adjust the upper and lower boundaries of each datetime window to beginning and end of the provided target period (day, week, month)
2 nested properties
The period of time that datetime windows will be clamped by
The possible formats for the cursor field, in order of preference. The first format that matches the cursor field value will be used to parse it. If not provided, the Outgoing Datetime Format will be used. Use placeholders starting with "%" to describe the format the API is using. The following placeholders are available:
- %s: Epoch unix timestamp -
1686218963 - %s_as_float: Epoch unix timestamp in seconds as float with microsecond precision -
1686218963.123456 - %ms: Epoch unix timestamp -
1686218963123 - %a: Weekday (abbreviated) -
Sun - %A: Weekday (full) -
Sunday - %w: Weekday (decimal) -
0(Sunday),6(Saturday) - %d: Day of the month (zero-padded) -
01,02, ...,31 - %b: Month (abbreviated) -
Jan - %B: Month (full) -
January - %m: Month (zero-padded) -
01,02, ...,12 - %y: Year (without century, zero-padded) -
00,01, ...,99 - %Y: Year (with century) -
0001,0002, ...,9999 - %H: Hour (24-hour, zero-padded) -
00,01, ...,23 - %I: Hour (12-hour, zero-padded) -
01,02, ...,12 - %p: AM/PM indicator
- %M: Minute (zero-padded) -
00,01, ...,59 - %S: Second (zero-padded) -
00,01, ...,59 - %f: Microsecond (zero-padded to 6 digits) -
000000,000001, ...,999999 - %_ms: Millisecond (zero-padded to 3 digits) -
000,001, ...,999 - %z: UTC offset -
(empty),+0000,-04:00 - %Z: Time zone name -
(empty),UTC,GMT - %j: Day of the year (zero-padded) -
001,002, ...,366 - %U: Week number of the year (Sunday as first day) -
00,01, ...,53 - %W: Week number of the year (Monday as first day) -
00,01, ...,53 - %c: Date and time representation -
Tue Aug 16 21:30:00 1988 - %x: Date representation -
08/16/1988 - %X: Time representation -
21:30:00 - %%: Literal '%' character
Some placeholders depend on the locale of the underlying system - in most cases this locale is configured as en/US. For more information see the Python documentation.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
The datetime that determines the last record that should be synced. If not provided, {{ now_utc() }} will be used.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Smallest increment the datetime_format has (ISO 8601 duration) that is used to ensure the start of a slice does not overlap with the end of the previous one, e.g. for %Y-%m-%d the granularity should
be P1D, for %Y-%m-%dT%H:%M:%SZ the granularity should be PT1S. Given this field is provided, step needs to be provided as well.
- PT0.000001S: 1 microsecond
- PT0.001S: 1 millisecond
- PT1S: 1 second
- PT1M: 1 minute
- PT1H: 1 hour
- P1D: 1 day
A data feed API is an API that does not allow filtering and paginates the content from the most recent to the least recent. Given this, the CDK needs to know when to stop paginating and this field will generate a stop condition for pagination.
Set to True if the target API endpoint does not take cursor values to filter records and returns all records anyway. This will cause the connector to filter out records locally, and only emit new records from the last sync, hence incremental. This means that all records would be read from the API, but only new records will be emitted to the destination.
Set to True if the target API does not accept queries where the start time equal the end time. This will cause those requests to be skipped.
Setting to True causes the connector to store the cursor as one value, instead of per-partition. This setting optimizes performance when the parent stream has thousands of partitions. Notably, the substream state is updated only at the end of the sync, which helps prevent data loss in case of a sync failure. See more info in the docs.
Time interval (ISO8601 duration) before the start_datetime to read data for, e.g. P1M for looking back one month.
- PT1H: 1 hour
- P1D: 1 day
- P1W: 1 week
- P1M: 1 month
- P1Y: 1 year
Name of the partition start time field.
Name of the partition end time field.
The size of the time window (ISO8601 duration). Given this field is provided, cursor_granularity needs to be provided as well.
- PT1H: 1 hour
- P1D: 1 day
- P1W: 1 week
- P1M: 1 month
- P1Y: 1 year
Authenticator for requests using JWT authentication flow.
Secret used to sign the JSON web token.
Algorithm used to sign the JSON web token.
When set to true, the secret key will be base64 encoded prior to being encoded as part of the JWT. Only set to "true" when required by the API.
The amount of time in seconds a JWT token can be valid after being issued.
The prefix to be used within the Authentication header.
JWT headers used when signing JSON web token.
3 nested properties
Private key ID for user account.
The media type of the complete JWT.
Content type of JWT header.
Additional headers to be included with the JWT headers object.
JWT Payload used when signing JSON web token.
3 nested properties
The user/principal that issued the JWT. Commonly a value unique to the user.
The subject of the JWT. Commonly defined by the API.
The recipient that the JWT is intended for. Commonly defined by the API.
Additional properties to be added to the JWT payload.
Authenticator for requests using OAuth 2.0 authorization flow.
The name of the property to use to refresh the access_token.
The OAuth client ID. Fill it in the user inputs.
The name of the property to use to refresh the access_token.
The OAuth client secret. Fill it in the user inputs.
The name of the property to use to refresh the access_token.
Credential artifact used to get a new access token.
The full URL to call to obtain a new access token.
The name of the property which contains the access token in the response from the token refresh endpoint.
The value of the access_token to bypass the token refreshing using refresh_token.
The name of the property which contains the expiry date in the response from the token refresh endpoint.
The name of the property to use to refresh the access_token.
Specifies the OAuth2 grant type. If set to refresh_token, the refresh_token needs to be provided as well. For client_credentials, only client id and secret are required. Other grant types are not officially supported.
Body of the request sent to get a new access token.
Headers of the request sent to get a new access token.
List of scopes that should be granted to the access token.
The access token expiry date.
The format of the time to expiration datetime. Provide it if the time is returned as a date-time string instead of seconds.
When the refresh token updater is defined, new refresh tokens, access tokens and the access token expiry date are written back from the authentication response to the config object. This is important if the refresh token can only used once.
7 nested properties
The name of the property which contains the updated refresh token in the response from the token refresh endpoint.
Config path to the access token. Make sure the field actually exists in the config.
[
"credentials",
"access_token"
]
Config path to the access token. Make sure the field actually exists in the config.
[
"credentials",
"refresh_token"
]
Config path to the expiry date. Make sure actually exists in the config.
[
"credentials",
"token_expiry_date"
]
Status Codes to Identify refresh token error in response (Refresh Token Error Key and Refresh Token Error Values should be also specified). Responses with one of the error status code and containing an error value will be flagged as a config error
[]
Key to Identify refresh token error in response (Refresh Token Error Status Codes and Refresh Token Error Values should be also specified).
List of values to check for exception during token refresh process. Used to check if the error found in the response matches the key from the Refresh Token Error Key field (e.g. response={"error": "invalid_grant"}). Only responses with one of the error status code and containing an error value will be flagged as a config error
[]
Authenticator for requests using JWT authentication flow.
11 nested properties
Secret used to sign the JSON web token.
Algorithm used to sign the JSON web token.
When set to true, the secret key will be base64 encoded prior to being encoded as part of the JWT. Only set to "true" when required by the API.
The amount of time in seconds a JWT token can be valid after being issued.
The prefix to be used within the Authentication header.
JWT headers used when signing JSON web token.
3 nested properties
Private key ID for user account.
The media type of the complete JWT.
Content type of JWT header.
Additional headers to be included with the JWT headers object.
JWT Payload used when signing JSON web token.
3 nested properties
The user/principal that issued the JWT. Commonly a value unique to the user.
The subject of the JWT. Commonly defined by the API.
The recipient that the JWT is intended for. Commonly defined by the API.
Additional properties to be added to the JWT payload.
Enable using profile assertion as a flow for OAuth authorization.
A stream whose behavior is described by a set of declarative low code components.
Component used to coordinate how records are extracted across stream slices and request pages.
The stream name.
Component used to fetch data incrementally based on a time field in the data.
The stream field to be used to distinguish unique records. Can either be a single field, an array of fields representing a composite key, or an array of arrays representing a composite key where the fields are nested fields.
One or many schema loaders can be used to retrieve the schema for the current stream. When multiple schema loaders are defined, schema properties will be merged together. Schema loaders defined first taking precedence in the event of a conflict.
A list of transformations to be applied to each output record.
Array of state migrations to be applied on the input state
[]
(experimental) Describes how to fetch a file
6 nested properties
Requester component that describes how to prepare HTTP requests to send to the source API.
Responsible for fetching the url where the file is located. This is applied on each records and not on the HTTP response
Responsible for fetching the content of the file. If not defined, the assumption is that the whole response body is the file content
Defines the name to store the file. Stream name is automatically added to the file path. File unique ID can be used to avoid overwriting files. Random UUID will be used if the extractor is not provided.
Defines how many requests can be made to the API in a given time frame. HTTPAPIBudget extracts the remaining call count and the reset time from HTTP response headers using the header names provided by ratelimit_remaining_header and ratelimit_reset_header. Only requests using HttpRequester are rate-limited; custom components that bypass HttpRequester are not covered by this budget.
List of call rate policies that define how many calls are allowed.
The HTTP response header name that indicates when the rate limit resets.
The HTTP response header name that indicates the number of remaining allowed calls.
List of HTTP status codes that indicate a rate limit has been hit.
[
429
]
A policy that allows a fixed number of calls within a specific time window.
The time interval for the rate limit window.
The maximum number of calls allowed within the period.
List of matchers that define which requests this policy applies to.
A policy that allows a fixed number of calls within a moving time window.
List of rates that define the call limits for different time intervals.
List of matchers that define which requests this policy applies to.
A policy that allows unlimited calls for specific requests.
List of matchers that define which requests this policy applies to.
Defines a rate limit with a specific number of calls allowed within a time interval.
The maximum number of calls allowed within the interval.
The time interval for the rate limit.
Matches HTTP requests based on method, base URL, URL path pattern, query parameters, and headers. Use url_base to specify the scheme and host (without trailing slash) and url_path_pattern to apply a regex to the request path.
The HTTP method to match (e.g., GET, POST).
The base URL (scheme and host, e.g. "https://api.example.com") to match.
A regular expression pattern to match the URL path.
The query parameters to match.
The headers to match.
Component defining how to handle errors. Default behavior includes only retrying server errors (HTTP 5XX) and too many requests (HTTP 429) with an exponential backoff.
List of backoff strategies to use to determine how long to wait before retrying a retryable request.
The maximum number of time to retry a retryable request before giving up and failing.
List of response filters to iterate on when deciding how to handle an error. When using an array of multiple filters, the filters will be applied sequentially and the response will be selected if it matches any of the filter's predicate.
Default pagination implementation to request pages of results with a fixed size until the pagination strategy no longer returns a next_page_token.
Strategy defining how records are paginated.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Record extractor that searches a decoded response over a path defined as an array of fields.
List of potentially nested fields describing the full path of the field to extract. Use "*" to extract all values from an array. See more info in the docs.
A record extractor designed for handling large responses that may exceed memory limits (to prevent OOM issues). It downloads a CSV file to disk, reads the data from disk, and deletes the file once it has been fully processed.
Backoff strategy with an exponential backoff interval. The interval is defined as factor * 2^attempt_count.
Multiplicative constant applied on each retry.
Record merge strategy that combines records according to fields on the record.
The name of the field on the record whose value will be used to group properties that were retrieved through multiple API requests.
Authenticator for requests using the session token as an API key that's injected into the request.
Requester submitting HTTP requests and extracting records from the response.
16 nested properties
Deprecated, use the url instead. Base URL of the API source. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
The URL of the source API endpoint. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
Deprecated, use the url instead. Path the specific API endpoint that this stream represents. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
The HTTP method used to fetch data from the source (can be GET or POST).
Authentication method to use for requests sent to the API.
Defines the behavior for fetching the list of properties from an API that will be loaded into the requests to extract records.
4 nested properties
Describes the path to the field that should be extracted
Requester component that describes how to fetch the properties to query from a remote API endpoint.
For APIs that require explicit specification of the properties to query for, this component specifies which property fields and how they are supplied to outbound requests.
5 nested properties
The set of properties that will be queried for in the outbound request. This can either be statically defined or dynamic based on an API endpoint
The list of properties that should be included in every set of properties when multiple chunks of properties are being requested.
For APIs with restrictions on the amount of properties that can be requester per request, property chunking can be applied to make multiple requests with a subset of the properties.
Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.
Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.
Specifies how to populate the body of the request with a non-JSON payload. Plain text will be sent as is, whereas objects will be converted to a urlencoded form.
Specifies how to populate the body of the request with a JSON payload. Can contain nested objects.
Specifies how to populate the body of the request with a payload. Can contain nested objects.
Error handler component that defines how to handle errors.
Enables stream requests caching. This field is automatically set by the CDK.
The path in the response body returned from the login requester to the session token.
Authentication method to use for requests sent to the API, specifying how to inject the session token.
The duration in ISO 8601 duration notation after which the session token expires, starting from the time it was obtained. Omitting it will result in the session token being refreshed for every request.
- PT1H: 1 hour
- P1D: 1 day
- P1W: 1 week
- P1M: 1 month
- P1Y: 1 year
Component used to decode the response.
Authenticator for requests using the session token as an API key that's injected into the request.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Authenticator for requests using the session token as a standard bearer token.
Requester submitting HTTP requests and extracting records from the response.
Deprecated, use the url instead. Base URL of the API source. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
The URL of the source API endpoint. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
Deprecated, use the url instead. Path the specific API endpoint that this stream represents. Do not put sensitive information (e.g. API tokens) into this field - Use the Authenticator component for this.
The HTTP method used to fetch data from the source (can be GET or POST).
Authentication method to use for requests sent to the API.
Defines the behavior for fetching the list of properties from an API that will be loaded into the requests to extract records.
4 nested properties
Describes the path to the field that should be extracted
Requester component that describes how to fetch the properties to query from a remote API endpoint.
For APIs that require explicit specification of the properties to query for, this component specifies which property fields and how they are supplied to outbound requests.
5 nested properties
The set of properties that will be queried for in the outbound request. This can either be statically defined or dynamic based on an API endpoint
The list of properties that should be included in every set of properties when multiple chunks of properties are being requested.
For APIs with restrictions on the amount of properties that can be requester per request, property chunking can be applied to make multiple requests with a subset of the properties.
5 nested properties
The type used to determine the maximum number of properties per chunk
The maximum amount of properties that can be retrieved per request according to the limit type.
Record merge strategy that combines records according to fields on the record.
Specifies the query parameters that should be set on an outgoing HTTP request given the inputs.
Return any non-auth headers. Authentication headers will overwrite any overlapping headers returned from this method.
Specifies how to populate the body of the request with a non-JSON payload. Plain text will be sent as is, whereas objects will be converted to a urlencoded form.
Specifies how to populate the body of the request with a JSON payload. Can contain nested objects.
Specifies how to populate the body of the request with a payload. Can contain nested objects.
Error handler component that defines how to handle errors.
Enables stream requests caching. This field is automatically set by the CDK.
A filter that is used to select on properties of the HTTP response received. When used with additional filters, a response will be selected if it matches any of the filter's criteria.
Action to execute if a response matches the filter.
Failure type of traced exception if a response matches the filter.
Error Message to display if the response matches the filter.
Match the response if its error message contains the substring.
Match the response if its HTTP code is included in this list.
Match the response if the predicate evaluates to true.
(This component is experimental. Use at your own risk.) Represents a complex field type.
(This component is experimental. Use at your own risk.) Represents a mapping between a current type and its corresponding target type.
(This component is experimental. Use at your own risk.) Identifies schema details for dynamic schema extraction and processing.
List of potentially nested fields describing the full path of the field key to extract.
List of nested fields defining the schema field path to extract. Defaults to [].
[]
List of potentially nested fields describing the full path of the field type to extract.
(This component is experimental. Use at your own risk.) Loads a schema by extracting data from retrieved records.
Component used to coordinate how records are extracted across stream slices and request pages.
(This component is experimental. Use at your own risk.) Identifies schema details for dynamic schema extraction and processing.
6 nested properties
List of potentially nested fields describing the full path of the field key to extract.
List of nested fields defining the schema field path to extract. Defaults to [].
[]
List of potentially nested fields describing the full path of the field type to extract.
Responsible for filtering fields to be added to json schema.
A list of transformations to be applied to the schema.
Loads a schema that is defined directly in the manifest file.
Describes a streams' schema. Refer to the Data Types documentation for more details on which types are valid.
Loads the schema from a json file.
Path to the JSON file defining the schema. The path is relative to the connector module's root.
Select 'JSON' if the response is formatted as a JSON object.
Select 'JSON Lines' if the response consists of JSON objects separated by new lines ('\n') in JSONL format.
A transformation that renames all keys to lower case.
A transformation that renames all keys to snake case.
A transformation that flatten record to single level format.
Whether to flatten lists or leave it as is. Default is True.
Prefix to add for object keys. If not provided original keys remain unchanged.
Suffix to add for object keys. If not provided original keys remain unchanged.
A transformation that flatten field values to the to top of the record.
A path to field that needs to be flattened.
Whether to delete the origin value or keep it. Default is False.
Whether to replace the origin record or not. Default is False.
3 nested properties
Prefix to add for object keys. If not provided original keys remain unchanged.
Suffix to add for object keys. If not provided original keys remain unchanged.
A transformation that replaces symbols in keys.
Old value to replace.
New value to set.
Select 'Iterable' if the response consists of strings separated by new lines (\n). The string will then be wrapped into a JSON object with the record key.
Select 'XML' if the response consists of XML-formatted data.
Use this to implement custom decoder logic.
Fully-qualified name of the class that will be implementing the custom decoding. Has to be a sub class of Decoder. The format is source_<name>.<package>.<class_name>.
Select 'ZIP file' for response data that is returned as a zipfile. Requires specifying an inner data type/decoder to parse the unzipped data.
Parser to parse the decompressed data from the zipfile(s).
A Partition router that specifies a list of attributes where each attribute describes a portion of the complete data set for a stream. During a sync, each value is iterated over and can be used as input to outbound API requests.
While iterating over list values, the name of field used to reference a list value. The partition value can be accessed with string interpolation. e.g. "{{ stream_partition['my_key'] }}" where "my_key" is the value of the cursor_field.
The list of attributes being iterated over and used as input for the requests made to the source API.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Compares the provided date against optional minimum or maximum times. The max_datetime serves as the ceiling and will be returned when datetime exceeds it. The min_datetime serves as the floor.
Datetime value.
Format of the datetime value. Defaults to "%Y-%m-%dT%H:%M:%S.%f%z" if left empty. Use placeholders starting with "%" to describe the format the API is using. The following placeholders are available:
- %s: Epoch unix timestamp -
1686218963 - %s_as_float: Epoch unix timestamp in seconds as float with microsecond precision -
1686218963.123456 - %ms: Epoch unix timestamp -
1686218963123 - %a: Weekday (abbreviated) -
Sun - %A: Weekday (full) -
Sunday - %w: Weekday (decimal) -
0(Sunday),6(Saturday) - %d: Day of the month (zero-padded) -
01,02, ...,31 - %b: Month (abbreviated) -
Jan - %B: Month (full) -
January - %m: Month (zero-padded) -
01,02, ...,12 - %y: Year (without century, zero-padded) -
00,01, ...,99 - %Y: Year (with century) -
0001,0002, ...,9999 - %H: Hour (24-hour, zero-padded) -
00,01, ...,23 - %I: Hour (12-hour, zero-padded) -
01,02, ...,12 - %p: AM/PM indicator
- %M: Minute (zero-padded) -
00,01, ...,59 - %S: Second (zero-padded) -
00,01, ...,59 - %f: Microsecond (zero-padded to 6 digits) -
000000,000001, ...,999999 - %_ms: Millisecond (zero-padded to 3 digits) -
000,001, ...,999 - %z: UTC offset -
(empty),+0000,-04:00 - %Z: Time zone name -
(empty),UTC,GMT - %j: Day of the year (zero-padded) -
001,002, ...,366 - %U: Week number of the year (Sunday as first day) -
00,01, ...,53 - %W: Week number of the year (Monday as first day) -
00,01, ...,53 - %c: Date and time representation -
Tue Aug 16 21:30:00 1988 - %x: Date representation -
08/16/1988 - %X: Time representation -
21:30:00 - %%: Literal '%' character
Some placeholders depend on the locale of the underlying system - in most cases this locale is configured as en/US. For more information see the Python documentation.
Ceiling applied on the datetime value. Must be formatted with the datetime_format field.
Floor applied on the datetime value. Must be formatted with the datetime_format field.
Authenticator for requests requiring no authentication.
Pagination implementation that never returns a next page.
Specification describing how an 'advanced' Auth flow would need to function.
OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth. Must be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification using special annotation 'path_in_connector_config'. These are input values the user is entering through the UI to authenticate to the connector, that might also shared as inputs for syncing data via the connector. Examples: if no connector values is shared during oauth flow, oauth_user_input_from_connector_config_specification=[] if connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['app_id'] } } if connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['info', 'app_id'] } }
The DeclarativeOAuth specific blob. Pertains to the fields defined by the connector relating to the OAuth flow.
Interpolation capabilities:
-
The variables placeholders are declared as
{{my_var}}. -
The nested resolution variables like
{{ {{my_nested_var}} }}is allowed as well. -
The allowed interpolation context is:
- base64Encoder - encode to
base64, {{ {{my_var_a}}:{{my_var_b}} | base64Encoder }} - base64Decorer - decode from
base64encoded string, {{ {{my_string_variable_or_string_value}} | base64Decoder }} - urlEncoder - encode the input string to URL-like format, {{ https://test.host.com/endpoint | urlEncoder}}
- urlDecorer - decode the input url-encoded string into text format, {{ urlDecoder:https%3A%2F%2Fairbyte.io | urlDecoder}}
- codeChallengeS256 - get the
codeChallengeencoded value to provide additional data-provider specific authorisation values, {{ {{state_value}} | codeChallengeS256 }}
- base64Encoder - encode to
Examples:
- The TikTok Marketing DeclarativeOAuth spec: { "oauth_connector_input_specification": { "type": "object", "additionalProperties": false, "properties": { "consent_url": "https://ads.tiktok.com/marketing_api/auth?{{client_id_key}}={{client_id_value}}&{{redirect_uri_key}}={{ {{redirect_uri_value}} | urlEncoder}}&{{state_key}}={{state_value}}", "access_token_url": "https://business-api.tiktok.com/open_api/v1.3/oauth2/access_token/", "access_token_params": { "{{ auth_code_key }}": "{{ auth_code_value }}", "{{ client_id_key }}": "{{ client_id_value }}", "{{ client_secret_key }}": "{{ client_secret_value }}" }, "access_token_headers": { "Content-Type": "application/json", "Accept": "application/json" }, "extract_output": ["data.access_token"], "client_id_key": "app_id", "client_secret_key": "secret", "auth_code_key": "auth_code" } } }
13 nested properties
The DeclarativeOAuth Specific string URL string template to initiate the authentication. The placeholders are replaced during the processing to provide neccessary values.
The DeclarativeOAuth Specific URL templated string to obtain the access_token, refresh_token etc.
The placeholders are replaced during the processing to provide neccessary values.
The DeclarativeOAuth Specific string of the scopes needed to be grant for authenticated user.
The DeclarativeOAuth Specific optional headers to inject while exchanging the auth_code to access_token during completeOAuthFlow step.
The DeclarativeOAuth Specific optional query parameters to inject while exchanging the auth_code to access_token during completeOAuthFlow step.
When this property is provided, the query params will be encoded as Json and included in the outgoing API request.
The DeclarativeOAuth Specific list of strings to indicate which keys should be extracted and returned back to the input config.
The DeclarativeOAuth Specific object to provide the criteria of how the state query param should be constructed,
including length and complexity.
2 nested properties
The DeclarativeOAuth Specific optional override to provide the custom client_id key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom client_secret key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom scope key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom state key name, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom code key name to something like auth_code or custom_auth_code, if required by data-provider.
The DeclarativeOAuth Specific optional override to provide the custom redirect_uri key name to something like callback_uri, if required by data-provider.
OAuth specific blob. This is a Json Schema used to validate Json configurations produced by the OAuth flows as they are
returned by the distant OAuth APIs.
Must be a valid JSON describing the fields to merge back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_output_specification={
refresh_token: {
type: string,
path_in_connector_config: ['credentials', 'refresh_token']
}
}
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations. Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the server when completing an OAuth flow (typically exchanging an auth code for refresh token). Examples: complete_oauth_server_input_specification={ client_id: { type: string }, client_secret: { type: string } }
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations that
also need to be merged back into the connector configuration at runtime.
This is a subset configuration of complete_oauth_server_input_specification that filters fields out to retain only the ones that
are necessary for the connector to function with OAuth. (some fields could be used during oauth flows but not needed afterwards, therefore
they would be listed in the complete_oauth_server_input_specification but not complete_oauth_server_output_specification)
Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the
connector when using OAuth flow APIs.
These fields are to be merged back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_server_output_specification={
client_id: {
type: string,
path_in_connector_config: ['credentials', 'client_id']
},
client_secret: {
type: string,
path_in_connector_config: ['credentials', 'client_secret']
}
}
Pagination strategy that returns the number of records reads so far and returns it as the next page token.
The number of records to include in each pages.
Using the offset with value 0 during the first request
Pagination strategy that returns the number of pages reads so far and returns it as the next page token.
The number of records to include in each pages.
Index of the first page to request.
Using the page number with value defined by start_from_page during the first request
Describes how to construct partitions from the records retrieved from the parent stream..
Reference to the parent stream.
The primary key of records from the parent stream that will be used during the retrieval of records for the current substream. This parent identifier field is typically a characteristic of the child records being extracted from the source API.
While iterating over parent records during a sync, the parent_key value can be referenced by using this field.
Specifies the key field or path and where in the request a component's value should be injected.
4 nested properties
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
Indicates whether the parent stream should be read incrementally based on updates in the child stream.
If set, this will enable lazy reading, using the initial read of parent records to extract child records.
[]
Array of field paths to include as additional fields in the stream slice. Each path is an array of strings representing keys to access fields in the respective parent record. Accessible via stream_slice.extra_fields. Missing fields are set to None.
The stream field to be used to distinguish unique records. Can either be a single field, an array of fields representing a composite key, or an array of arrays representing a composite key where the fields are nested fields.
"id"[ "code", "type" ]
Defines the behavior for fetching the list of properties from an API that will be loaded into the requests to extract records.
Describes the path to the field that should be extracted
Requester component that describes how to fetch the properties to query from a remote API endpoint.
For APIs with restrictions on the amount of properties that can be requester per request, property chunking can be applied to make multiple requests with a subset of the properties.
The type used to determine the maximum number of properties per chunk
The maximum amount of properties that can be retrieved per request according to the limit type.
Record merge strategy that combines records according to fields on the record.
3 nested properties
The name of the field on the record whose value will be used to group properties that were retrieved through multiple API requests.
For APIs that require explicit specification of the properties to query for, this component specifies which property fields and how they are supplied to outbound requests.
The set of properties that will be queried for in the outbound request. This can either be statically defined or dynamic based on an API endpoint
The list of properties that should be included in every set of properties when multiple chunks of properties are being requested.
For APIs with restrictions on the amount of properties that can be requester per request, property chunking can be applied to make multiple requests with a subset of the properties.
5 nested properties
The type used to determine the maximum number of properties per chunk
The maximum amount of properties that can be retrieved per request according to the limit type.
Record merge strategy that combines records according to fields on the record.
3 nested properties
The name of the field on the record whose value will be used to group properties that were retrieved through multiple API requests.
Filter applied on a list of records.
The predicate to filter a record. Records will be removed if evaluated to False.
Responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering records based on a heuristic.
Responsible for filtering records to be emitted by the Source.
Responsible for normalization according to the schema.
If true, transformation will be applied before record filtering.
Responsible for normalization according to the schema.
"Default""None"
A transformation which removes fields from a record. The fields removed are designated using FieldPointers. During transformation, if a field or any of its parents does not exist in the record, no error is thrown.
Array of paths defining the field to remove. Each item is an array whose field describe the path of a field to remove.
The predicate to filter a property by a property value. Property will be removed if it is empty OR expression is evaluated to True.,
Specifies where in the request path a component's value should be inserted.
Specifies the key field or path and where in the request a component's value should be injected.
Configures where the descriptor should be set on the HTTP requests. Note that request parameters that are already encoded in the URL path will not be duplicated.
Configures which key should be used in the location that the descriptor is being injected into. We hope to eventually deprecate this field in favor of field_path for all request_options, but must currently maintain it for backwards compatibility in the Builder.
Configures a path to be used for nested structures in JSON body requests (e.g. GraphQL queries)
The stream schemas representing the shape of the data emitted by the stream.
Deprecated - use SessionTokenAuthenticator instead. Authenticator for requests authenticated using session tokens. A session token is a random value generated by a server to identify a specific user for the duration of one interaction session.
The name of the session token header that will be injected in the request
Path of the login URL (do not include the base URL)
Name of the key of the session token to be extracted from the response
Path of the URL to use to validate that the session token is valid (do not include the base URL)
Session token to use if using a pre-defined token. Not needed if authenticating with username + password pair
Username used to authenticate and obtain a session token
Password used to authenticate and obtain a session token
(This component is experimental. Use at your own risk.) Orchestrate the retriever's usage based on the state value.
The stream name.
A stream whose behavior is described by a set of declarative low code components.
10 nested properties
Component used to coordinate how records are extracted across stream slices and request pages.
The stream name.
Component used to fetch data incrementally based on a time field in the data.
The stream field to be used to distinguish unique records. Can either be a single field, an array of fields representing a composite key, or an array of arrays representing a composite key where the fields are nested fields.
One or many schema loaders can be used to retrieve the schema for the current stream. When multiple schema loaders are defined, schema properties will be merged together. Schema loaders defined first taking precedence in the event of a conflict.
A list of transformations to be applied to each output record.
Array of state migrations to be applied on the input state
[]
(experimental) Describes how to fetch a file
6 nested properties
Requester component that describes how to prepare HTTP requests to send to the source API.
Responsible for fetching the url where the file is located. This is applied on each records and not on the HTTP response
Responsible for fetching the content of the file. If not defined, the assumption is that the whole response body is the file content
Defines the name to store the file. Stream name is automatically added to the file path. File unique ID can be used to avoid overwriting files. Random UUID will be used if the extractor is not provided.
A stream whose behavior is described by a set of declarative low code components.
10 nested properties
Component used to coordinate how records are extracted across stream slices and request pages.
The stream name.
Component used to fetch data incrementally based on a time field in the data.
The stream field to be used to distinguish unique records. Can either be a single field, an array of fields representing a composite key, or an array of arrays representing a composite key where the fields are nested fields.
One or many schema loaders can be used to retrieve the schema for the current stream. When multiple schema loaders are defined, schema properties will be merged together. Schema loaders defined first taking precedence in the event of a conflict.
A list of transformations to be applied to each output record.
Array of state migrations to be applied on the input state
[]
(experimental) Describes how to fetch a file
6 nested properties
Requester component that describes how to prepare HTTP requests to send to the source API.
Responsible for fetching the url where the file is located. This is applied on each records and not on the HTTP response
Responsible for fetching the content of the file. If not defined, the assumption is that the whole response body is the file content
Defines the name to store the file. Stream name is automatically added to the file path. File unique ID can be used to avoid overwriting files. Random UUID will be used if the extractor is not provided.
Retrieves records by synchronously sending requests to fetch records. The retriever acts as an orchestrator between the requester, the record selector, the paginator, and the partition router.
Requester component that describes how to prepare HTTP requests to send to the source API.
Responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering records based on a heuristic.
6 nested properties
Responsible for filtering records to be emitted by the Source.
Responsible for normalization according to the schema.
If true, transformation will be applied before record filtering.
Component decoding the response so records can be extracted.
Paginator component that describes how to navigate through the API's pages.
If true, the partition router and incremental request options will be ignored when paginating requests. Request options set directly on the requester will not be ignored.
Used to iteratively execute requests over a set of values, such as a parent stream's records or a list of constant values.
Select 'gzip' for response data that is compressed with gzip. Requires specifying an inner data type/decoder to parse the decompressed data.
Select 'CSV' for response data that is formatted as CSV (comma-separated values). Can specify an encoding (default: 'utf-8') and a delimiter (default: ',').
Matches the api job status to Async Job Status.
Retrieves records by Asynchronously sending requests to fetch records. The retriever acts as an orchestrator between the requester, the record selector, the paginator, and the partition router.
Responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering records based on a heuristic.
6 nested properties
Responsible for filtering records to be emitted by the Source.
Responsible for normalization according to the schema.
If true, transformation will be applied before record filtering.
Async Job Status to Airbyte CDK Async Job Status mapping.
Responsible for fetching the actual status of the async job.
Responsible for fetching the final result urls provided by the completed / finished / ready async job.
Requester component that describes how to prepare HTTP requests to send to the source API to create the async server-side job.
Requester component that describes how to prepare HTTP requests to send to the source API to fetch the status of the running async job.
Requester component that describes how to prepare HTTP requests to send to the source API to download the data provided by the completed async job.
Responsible for fetching the records from provided urls.
The time in minutes after which the single Async Job should be considered as Timed Out.
Requester component that describes how to prepare HTTP requests to send to the source API to extract the url from polling response by the completed async job.
Paginator component that describes how to navigate through the API's pages during download.
Requester component that describes how to prepare HTTP requests to send to the source API to abort a job once it is timed out from the source's perspective.
Requester component that describes how to prepare HTTP requests to send to the source API to delete a job once the records are extracted.
PartitionRouter component that describes how to partition the stream, enabling incremental syncs and checkpointing.
[]
Component decoding the response so records can be extracted.
Component decoding the download response so records can be extracted.
A source specification made up of connector metadata and how it can be configured.
A connection specification describing how a the connector can be configured.
URL of the connector's documentation page.
Additional and optional specification object to describe what an 'advanced' Auth flow would need to function.
- A connector should be able to fully function with the configuration as described by the ConnectorSpecification in a 'basic' mode.
- The 'advanced' mode provides easier UX for the user with UI improvements and automations. However, this requires further setup on the server side by instance or workspace admins beforehand. The trade-off is that the user does not have to provide as many technical inputs anymore and the auth process is faster and easier to complete.
4 nested properties
The type of auth to use
JSON path to a field in the connectorSpecification that should exist for the advanced auth to be applicable.
Value of the predicate_key fields for the advanced auth to be applicable.
Specification describing how an 'advanced' Auth flow would need to function.
5 nested properties
OAuth specific blob. This is a Json Schema used to validate Json configurations used as input to OAuth. Must be a valid non-nested JSON that refers to properties from ConnectorSpecification.connectionSpecification using special annotation 'path_in_connector_config'. These are input values the user is entering through the UI to authenticate to the connector, that might also shared as inputs for syncing data via the connector. Examples: if no connector values is shared during oauth flow, oauth_user_input_from_connector_config_specification=[] if connector values such as 'app_id' inside the top level are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['app_id'] } } if connector values such as 'info.app_id' nested inside another object are used to generate the API url for the oauth flow, oauth_user_input_from_connector_config_specification={ app_id: { type: string path_in_connector_config: ['info', 'app_id'] } }
The DeclarativeOAuth specific blob. Pertains to the fields defined by the connector relating to the OAuth flow.
Interpolation capabilities:
-
The variables placeholders are declared as
{{my_var}}. -
The nested resolution variables like
{{ {{my_nested_var}} }}is allowed as well. -
The allowed interpolation context is:
- base64Encoder - encode to
base64, {{ {{my_var_a}}:{{my_var_b}} | base64Encoder }} - base64Decorer - decode from
base64encoded string, {{ {{my_string_variable_or_string_value}} | base64Decoder }} - urlEncoder - encode the input string to URL-like format, {{ https://test.host.com/endpoint | urlEncoder}}
- urlDecorer - decode the input url-encoded string into text format, {{ urlDecoder:https%3A%2F%2Fairbyte.io | urlDecoder}}
- codeChallengeS256 - get the
codeChallengeencoded value to provide additional data-provider specific authorisation values, {{ {{state_value}} | codeChallengeS256 }}
- base64Encoder - encode to
Examples:
- The TikTok Marketing DeclarativeOAuth spec: { "oauth_connector_input_specification": { "type": "object", "additionalProperties": false, "properties": { "consent_url": "https://ads.tiktok.com/marketing_api/auth?{{client_id_key}}={{client_id_value}}&{{redirect_uri_key}}={{ {{redirect_uri_value}} | urlEncoder}}&{{state_key}}={{state_value}}", "access_token_url": "https://business-api.tiktok.com/open_api/v1.3/oauth2/access_token/", "access_token_params": { "{{ auth_code_key }}": "{{ auth_code_value }}", "{{ client_id_key }}": "{{ client_id_value }}", "{{ client_secret_key }}": "{{ client_secret_value }}" }, "access_token_headers": { "Content-Type": "application/json", "Accept": "application/json" }, "extract_output": ["data.access_token"], "client_id_key": "app_id", "client_secret_key": "secret", "auth_code_key": "auth_code" } } }
OAuth specific blob. This is a Json Schema used to validate Json configurations produced by the OAuth flows as they are
returned by the distant OAuth APIs.
Must be a valid JSON describing the fields to merge back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_output_specification={
refresh_token: {
type: string,
path_in_connector_config: ['credentials', 'refresh_token']
}
}
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations. Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the server when completing an OAuth flow (typically exchanging an auth code for refresh token). Examples: complete_oauth_server_input_specification={ client_id: { type: string }, client_secret: { type: string } }
OAuth specific blob. This is a Json Schema used to validate Json configurations persisted as Airbyte Server configurations that
also need to be merged back into the connector configuration at runtime.
This is a subset configuration of complete_oauth_server_input_specification that filters fields out to retain only the ones that
are necessary for the connector to function with OAuth. (some fields could be used during oauth flows but not needed afterwards, therefore
they would be listed in the complete_oauth_server_input_specification but not complete_oauth_server_output_specification)
Must be a valid non-nested JSON describing additional fields configured by the Airbyte Instance or Workspace Admins to be used by the
connector when using OAuth flow APIs.
These fields are to be merged back to ConnectorSpecification.connectionSpecification.
For each field, a special annotation path_in_connector_config can be specified to determine where to merge it,
Examples:
complete_oauth_server_output_specification={
client_id: {
type: string,
path_in_connector_config: ['credentials', 'client_id']
},
client_secret: {
type: string,
path_in_connector_config: ['credentials', 'client_secret']
}
}
4 nested properties
The discrete migrations that will be applied on the incoming config. Each migration will be applied in the order they are defined.
[]
The list of transformations that will be applied on the incoming config at the start of each sync. The transformations will be applied in the order they are defined.
[]
The list of validations that will be performed on the incoming config at the start of each sync.
[]
A config migration that will be applied on the incoming config at the start of a sync.
The list of transformations that will attempt to be applied on an incoming unmigrated config. The transformations will be applied in the order they are defined.
[]
The description/purpose of the config migration.
Partition router that is used to retrieve records that have been partitioned according to records from the specified parent streams. An example of a parent stream is automobile brands and the substream would be the various car models associated with each branch.
Specifies which parent streams are being iterated over and how parent records should be used to partition the child stream data set.
A schema type.
Extract wait time from a HTTP header in the response.
The name of the response header defining how long to wait before retrying.
Optional regex to apply on the header to extract its value. The regex should define a capture group defining the wait time.
Given the value extracted from the header is greater than this value, stop the stream.
A decorator on top of a partition router that groups partitions into batches of a specified size. This is useful for APIs that support filtering by multiple partition keys in a single request. Note that per-partition incremental syncs may not work as expected because the grouping of partitions might change between syncs, potentially leading to inconsistent state tracking.
The number of partitions to include in each group. This determines how many partition values are batched together in a single slice.
The partition router whose output will be grouped. This can be any valid partition router component.
If true, ensures that partitions are unique within each group by removing duplicates based on the partition key.
Extract time at which we can retry the request from response header and wait for the difference between now and that time.
The name of the response header defining how long to wait before retrying.
Minimum time to wait before retrying.
Optional regex to apply on the header to extract its value. The regex should define a capture group defining the wait time.
(This component is experimental. Use at your own risk.) Specifies a mapping definition to update or add fields in a record or configuration. This allows dynamic mapping of data by interpolating values into the template based on provided contexts.
A list of potentially nested fields indicating the full path where value will be added or updated.
The dynamic or static value to assign to the key. Interpolated values can be used to dynamically determine the value during runtime.
A schema type.
Determines whether to create a new path if it doesn't exist (true) or only update existing paths (false). When set to true, the resolver will create new paths in the stream template if they don't exist. When false (default), it will only update existing paths.
A condition that must be met for the mapping to be applied. This property is only supported for ConfigComponentsResolver.
(This component is experimental. Use at your own risk.) Component resolve and populates stream templates with components fetched via an HTTP retriever.
Component used to coordinate how records are extracted across stream slices and request pages.
(This component is experimental. Use at your own risk.) Describes how to get streams config from the source config.
A list of potentially nested fields indicating the full path in source config file where streams configs located.
A list of default values, each matching the structure expected from the parsed component value.
(This component is experimental. Use at your own risk.) Resolves and populates stream templates with components fetched from the source config.
(This component is experimental. Use at your own risk.) Represents a stream parameters definition to set up dynamic streams from defined values in manifest.
A list of object of parameters for stream, each object in the list represents params for one stream.
(This component is experimental. Use at your own risk.) Resolves and populates dynamic streams from defined parametrized values in manifest.
(This component is experimental. Use at your own risk.) Represents a stream parameters definition to set up dynamic streams from defined values in manifest.
2 nested properties
A list of object of parameters for stream, each object in the list represents params for one stream.
(This component is experimental. Use at your own risk.) A component that described how will be created declarative streams based on stream template.
Reference to the stream template.
Component resolve and populates stream templates with components values.
The dynamic stream name.
Whether or not to prioritize parent parameters over component parameters when constructing dynamic streams. Defaults to true for backward compatibility.
Request body value is sent as plain text
Request body value is converted into a url-encoded form
Request body value converted into a JSON object
Request body value converted into a GraphQL query object
Request body GraphQL query object
1 nested properties
The GraphQL query to be executed
Request body GraphQL query object
The GraphQL query to be executed
Validator that extracts the value located at a given field path.
List of potentially nested fields describing the full path of the field to validate. Use "*" to validate all values from an array.
The condition that the specified config value will be evaluated against
Validator that applies a validation strategy to a specified value.
The value to be validated. Can be a literal value or interpolated from configuration.
The validation strategy to apply to the value.
Validates that a user-provided schema adheres to a specified JSON schema.
The base JSON schema against which the user-provided schema will be validated.
Custom validation strategy that allows for custom validation logic.
Fully-qualified name of the class that will be implementing the custom validation strategy. Has to be a sub class of ValidationStrategy. The format is source_<name>.<package>.<class_name>.
Transformation that remaps a field's value to another value based on a static map.
A mapping of original values to new values. When a field value matches a key in this map, it will be replaced with the corresponding value.
The path to the field whose value should be remapped. Specified as a list of path components to navigate through nested objects.
Transformation that adds fields to a config. The path of the added field can be nested.
A list of transformations (path and corresponding value) that will be added to the config.
Fields will be added if expression is evaluated to True.
Transformation that removes a field from the config.
A list of field pointers to be removed from the config.
Fields will be removed if expression is evaluated to True.
A custom config transformation that can be used to transform the connector configuration.
Fully-qualified name of the class that will be implementing the custom config transformation. The format is source_<name>.<package>.<class_name>.
Additional parameters to be passed to the custom config transformation.