Estuary Flow Catalog (SchemaStore) JSON Schema

Type	object
File match	`flow.yaml` `.flow.yaml` `flow.json` `.flow.json`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/estuary-flow-catalog/latest.json
Source	https://raw.githubusercontent.com/estuary/flow/master/flow.schema.json

AutoDiscover object

Settings to determine how Flow should stay abreast of ongoing changes to collections and schemas.

addNewBindings boolean

Automatically add new bindings discovered from the source.

Default: false

evolveIncompatibleCollections boolean

Whether to automatically evolve collections and/or materialization bindings to handle changes to collections that would otherwise be incompatible with the existing catalog.

Default: false

Capture string

Capture names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples:

"acmeCo/capture"

CaptureBinding object

Examples:

{ "resource": { "stream": "a_stream" }, "target": "target/collection" }

resource required

target string required

Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples: "acmeCo/collection"

pattern=^[\p{Letter}\p{Number}\-_\.]+(/[\p{Letter}\p{Number}\-_\.]+)*$

backfill integer

Every increment of this counter will result in a new backfill of this binding from the captured endpoint. For example when capturing from a SQL table, incrementing this counter will cause the table to be re-captured in its entirety from the source database.

Note that a backfill does not truncate the target collection, and documents published by a backfilled binding will coexist with (and be ordered after) any documents which were published as part of a preceding backfill.

format=uint32min=0

disable boolean

Disabled bindings are inactive, and not validated. They can be used to represent discovered resources that are intentionally not being captured.

CaptureDef object

A Capture binds an external system and target (e.x., a SQL table or cloud storage bucket) from which data should be continuously captured, with a Flow collection into that captured data is ingested. Multiple Captures may be bound to a single collection, but only one capture may exist for a given endpoint and target.

bindings CaptureBinding[] required

endpoint object | object required

An endpoint from which Flow will capture.

autoDiscover object

Settings to determine how Flow should stay abreast of ongoing changes to collections and schemas.

2 nested properties

addNewBindings boolean

Automatically add new bindings discovered from the source.

Default: false

evolveIncompatibleCollections boolean

Whether to automatically evolve collections and/or materialization bindings to handle changes to collections that would otherwise be incompatible with the existing catalog.

Default: false

delete boolean

When true, a publication will delete this capture.

expectPubId string

interval string | null

Configured intervals are applicable only to connectors which are unable to continuously tail their source, and which instead produce a current quantity of output and then exit. Flow will start the connector again after the given interval of time has passed.

Intervals are relative to the start of an invocation and not its completion. For example, if the interval is five minutes, and an invocation of the capture finishes after two minutes, then the next invocation will be started after three additional minutes.

pattern=^\d+(s|m|h|d)$

redactSalt string

When provided, this base64-encoded salt is used instead of a generated one.

reset boolean

Publishing a value of true will reset this capture, which is equivalent to deleting and then re-creating the capture but applied as a single publication.

shards object

A ShardTemplate configures how shards process a catalog task.

Examples: {"hotStandbys":1,"maxTxnDuration":"30s"}

8 nested properties

disable boolean

flags object

Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.'). Each flag produces a shard label estuary.dev/flag/<name>.

hotStandbys integer

Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.

format=uint32min=0

logLevel string

Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".

maxTxnDuration string | null

This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

minTxnDuration string | null

This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

readChannelSize integer

Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

ringBufferSize integer

The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

CaptureEndpoint object | object

An endpoint from which Flow will capture.

Collection string

Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples:

"acmeCo/collection"

CollectionDef object

Collection describes a set of related documents, where each adheres to a common schema and grouping key. Collections are append-only: once a document is added to a collection, it is never removed. However, it may be replaced or updated (either in whole, or in part) by a future document sharing its key. Each new document of a given key is "reduced" into existing documents of the key. By default, this reduction is achieved by completely replacing the previous document, but much richer reduction behaviors can be specified through the use of annotated reduction strategies of the collection schema.

Examples:

{ "key": [ "/json/ptr" ], "schema": { "properties": { "bar": { "const": 42 }, "foo": { "type": "integer" } }, "type": "object" } }

key JsonPointer[] required

Ordered JSON-Pointers which define how a composite key may be extracted from a collection document.

Examples: ["/json/ptr"]

delete boolean

When true, a publication will delete this collection.

derive object

Derive specifies how a collection is derived from other collections.

5 nested properties

transforms TransformDef[] required

using object | object | object | object | object required

A derivation runtime implementation.

redactSalt string

When provided, this base64-encoded salt is used instead of a generated one.

shards object

A ShardTemplate configures how shards process a catalog task.

Examples: {"hotStandbys":1,"maxTxnDuration":"30s"}

8 nested properties

disable boolean

flags object

Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.'). Each flag produces a shard label estuary.dev/flag/<name>.

hotStandbys integer

Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.

format=uint32min=0

logLevel string

Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".

maxTxnDuration string | null

This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

minTxnDuration string | null

This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

readChannelSize integer

Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

ringBufferSize integer

The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

shuffleKeyTypes ShuffleType[]

Typically you omit this and Flow infers it from your transform shuffle keys. In some circumstances, Flow may require that you explicitly tell it of your shuffled key types.

expectPubId string

journals object

A JournalTemplate configures the journals which make up the physical partitions of a collection.

Examples: {"fragments":{"compressionCodec":"ZSTANDARD","flushInterval":"1h"}}

1 nested properties

fragments object required

A FragmentTemplate configures how journal fragment files are produced as part of a collection.

Examples: {"compressionCodec":"ZSTANDARD","flushInterval":"1h"}

4 nested properties

compressionCodec string

A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.

Values: "NONE" "GZIP" "ZSTANDARD" "SNAPPY" "GZIP_OFFLOAD_DECOMPRESSION"

Examples: "GZIP_OFFLOAD_DECOMPRESSION"

flushInterval string | null

into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.

pattern=^\d+(s|m|h|d)$

length integer

When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.

format=uint32min=32max=4096

retention string | null

If not set, then fragments are retained indefinitely.

pattern=^\d+(s|m|h|d)$

projections object

Examples: {"a_field":"/json/ptr","a_partition":{"location":"/json/ptr","partition":true}}

readSchema Value

A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.

Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.

For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.

Examples: "http://example/schema#/$defs/subPath", "../path/to/schema#/$defs/subPath", {"properties":{"bar":{"const":42},"foo":{"type":"integer"}},"type":"object"}

reset boolean

Publishing a value of true will reset this collection, which is equivalent to deleting and then re-creating the collection but applied as a single publication. If this is a derived collection then the derivation task state is also reset, effectively re-building the derivation in its entirety.

schema Value

A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.

Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.

For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.

Examples: "http://example/schema#/$defs/subPath", "../path/to/schema#/$defs/subPath", {"properties":{"bar":{"const":42},"foo":{"type":"integer"}},"type":"object"}

writeSchema Value

A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.

Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.

For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.

Examples: "http://example/schema#/$defs/subPath", "../path/to/schema#/$defs/subPath", {"properties":{"bar":{"const":42},"foo":{"type":"integer"}},"type":"object"}

CompositeKey JsonPointer[]

Ordered JSON-Pointers which define how a composite key may be extracted from a collection document.

Examples:

[ "/json/ptr" ]

CompressionCodec string

A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.

Examples:

"GZIP_OFFLOAD_DECOMPRESSION"

ConnectorConfig object

Connector image and configuration specification.

config required

image string required

DekafConfig object

Dekaf service configuration

config required

variant string required

Since we support integrating with a bunch of different providers via Dekaf, this allows us to store which of those connector variants this particular Dekaf connector was created as, in order to e.g link to the correct docs URL, show the correct name and logo, etc.

Derivation object

Derive specifies how a collection is derived from other collections.

transforms TransformDef[] required

using object | object | object | object | object required

A derivation runtime implementation.

redactSalt string

When provided, this base64-encoded salt is used instead of a generated one.

shards object

A ShardTemplate configures how shards process a catalog task.

Examples: {"hotStandbys":1,"maxTxnDuration":"30s"}

8 nested properties

disable boolean

flags object

Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.'). Each flag produces a shard label estuary.dev/flag/<name>.

hotStandbys integer

Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.

format=uint32min=0

logLevel string

Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".

maxTxnDuration string | null

This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

minTxnDuration string | null

This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

readChannelSize integer

Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

ringBufferSize integer

The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

shuffleKeyTypes ShuffleType[]

Typically you omit this and Flow infers it from your transform shuffle keys. In some circumstances, Flow may require that you explicitly tell it of your shuffled key types.

DeriveUsing object | object | object | object | object

A derivation runtime implementation.

DeriveUsingPython object

module string | string required

Module is either a relative URL of a Python module file, or is an inline representation of a Python module. The module must have an exported Derivation class which extends the generated IDerivation base class.

dependencies Record<string, string>

Map of package name to version specifier (e.g., {"httpx": ">=0.27", "pydantic": ">=2.0"}). These dependencies will be included in the generated pyproject.toml.

Default:

{}

DeriveUsingSqlite object

migrations string | string[]

Migrations may be provided as an inline string, or as a relative URL to a file containing the migration SQL.

DeriveUsingTypescript object

module string | string required

Module is either a relative URL of a TypeScript module file, or is an inline representation of a Typescript module. The module must have a exported Derivation variable which is an instance implementing the corresponding Derivation interface.

Field string

Field names a projection of a document location. They may include '/', but cannot begin or end with one. Many Fields are automatically inferred by Flow from a collection JSON Schema, and are the JSON Pointer of the document location with the leading '/' removed. User-provided Fields which act as a logical partitions are restricted to Unicode letters, numbers, '-', '_', or '.'

Examples:

"my_field"

FragmentTemplate object

A FragmentTemplate configures how journal fragment files are produced as part of a collection.

Examples:

{ "compressionCodec": "ZSTANDARD", "flushInterval": "1h" }

compressionCodec string

A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.

Values: "NONE" "GZIP" "ZSTANDARD" "SNAPPY" "GZIP_OFFLOAD_DECOMPRESSION"

Examples: "GZIP_OFFLOAD_DECOMPRESSION"

flushInterval string | null

into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.

pattern=^\d+(s|m|h|d)$

length integer

When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.

format=uint32min=32max=4096

retention string | null

If not set, then fragments are retained indefinitely.

pattern=^\d+(s|m|h|d)$

FullSource object

A source collection and details of how it's read.

Examples:

{ "name": "source/collection" }

name string required

Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples: "acmeCo/collection"

pattern=^[\p{Letter}\p{Number}\-_\.]+(/[\p{Letter}\p{Number}\-_\.]+)*$

notAfter string | null

Source collection documents published after this date-time are filtered. notAfter is only a filter. Updating its value will not cause Flow to re-process documents that have already been read. Optional. Default is to process all documents.

format=date-time

notBefore string | null

Source collection documents published before this date-time are filtered. notBefore is only a filter. Updating its value will not cause Flow to re-process documents that have already been read. Optional. Default is to process all documents.

format=date-time

partitions object

Partition selectors identify a desired subset of the available logical partitions of a collection.

Examples: {"exclude":{"other_partition":[32,64]},"include":{"a_partition":["A","B"]}}

2 nested properties

exclude Record<string, array>

Partition field names and values which are excluded from the source collection. Any documents matching any one of the partition values will be excluded.

Default:

{}

include Record<string, array>

Partition field names and corresponding values which must be matched from the Source collection. Only documents having one of the specified values across all specified partition names will be matched. For example, source: [App, Web] region: [APAC] would mean only documents of 'App' or 'Web' source and also occurring in the 'APAC' region will be processed.

Default:

{}

Id string

JournalTemplate object

A JournalTemplate configures the journals which make up the physical partitions of a collection.

Examples:

{ "fragments": { "compressionCodec": "ZSTANDARD", "flushInterval": "1h" } }

fragments object required

A FragmentTemplate configures how journal fragment files are produced as part of a collection.

Examples: {"compressionCodec":"ZSTANDARD","flushInterval":"1h"}

4 nested properties

compressionCodec string

A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.

Values: "NONE" "GZIP" "ZSTANDARD" "SNAPPY" "GZIP_OFFLOAD_DECOMPRESSION"

Examples: "GZIP_OFFLOAD_DECOMPRESSION"

flushInterval string | null

into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.

pattern=^\d+(s|m|h|d)$

length integer

When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.

format=uint32min=32max=4096

retention string | null

If not set, then fragments are retained indefinitely.

pattern=^\d+(s|m|h|d)$

JsonPointer string

JSON Pointer which identifies a location in a document.

Examples:

"/json/ptr"

LocalConfig object

Local command and its configuration.

command string[] required

config required

env Record<string, string>

protobuf boolean

MaterializationBinding object

Examples:

{ "fields": { "recommended": true }, "resource": { "table": "a_table" }, "source": "source/collection" }

resource required

source FullSource | Collection required

A source collection and details of how it's read.

Examples: "source/collection"

backfill integer

Every increment of this counter will result in a new backfill of this binding from its source collection to its materialized resource. For example when materializing to a SQL table, incrementing this counter causes the table to be dropped and then rebuilt by re-reading the source collection.

format=uint32min=0

disable boolean

Disabled bindings are inactive, and not validated.

fields object

MaterializationFields defines a selection of projections to materialize, as well as optional per-projection, driver-specific configuration.

Examples: {"exclude":["removed"],"recommended":true,"require":{"added":{}}}

4 nested properties

recommended boolean | integer required

Available selection modes and their meanings: false/0 = Only fields required by the user or the connector are materialized. 1 = Only top-level fields are selected. 2 = Second-level fields are selected, or top-level fields having no children. 3, 4, ... = Further levels of nesting are selected. true = Select nested fields regardless of their depth.

exclude Field[]

This removes from recommended projections, where enabled.

groupBy Field[]

If not specified, the key of the source collection is used. Materialization bindings may select an ordered subset of scalar fields which will be grouped over, resulting in a natural index over the chosen group-by key. Fields may or may not be part of the collection key.

require object

This supplements any selected fields, where enabled. Values are passed to and interpreted by the connector, which may use it to customize DDL generation or other behaviors with respect to the field. Consult connector documentation to see what it supports.

Note that this field has been renamed from include, which will continue to be accepted as an alias.

onIncompatibleSchemaChange string | string | string | string

Determines how to handle incompatible schema changes for a given binding.

Examples: "backfill"

priority integer

When all bindings are of equal priority, Flow processes documents according to their associated publishing time, as encoded in the document UUID.

However, when one binding has a higher priority than others, then all ready documents are processed through the binding before any documents of other bindings are processed.

format=uint32min=0

MaterializationDef object

A Materialization binds a Flow collection with an external system & target (e.x, a SQL table) into which the collection is to be continuously materialized.

bindings MaterializationBinding[] required

endpoint object | object | object required

An Endpoint connector used for Flow materializations.

delete boolean

When true, a publication will delete this materialization.

expectPubId string

onIncompatibleSchemaChange string | string | string | string

Determines how to handle incompatible schema changes for a given binding.

Examples: "backfill"

reset boolean

Publishing a value of true will reset this materialization, which is equivalent to deleting and then re-creating the materialization but applied as a single publication.

shards object

A ShardTemplate configures how shards process a catalog task.

Examples: {"hotStandbys":1,"maxTxnDuration":"30s"}

8 nested properties

disable boolean

flags object

Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.'). Each flag produces a shard label estuary.dev/flag/<name>.

hotStandbys integer

Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.

format=uint32min=0

logLevel string

Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".

maxTxnDuration string | null

This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

minTxnDuration string | null

This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

readChannelSize integer

Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

ringBufferSize integer

The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

source Capture | SourceDef

MaterializationEndpoint object | object | object

An Endpoint connector used for Flow materializations.

MaterializationFields object

MaterializationFields defines a selection of projections to materialize, as well as optional per-projection, driver-specific configuration.

Examples:

{ "exclude": [ "removed" ], "recommended": true, "require": { "added": {} } }

recommended boolean | integer required

Available selection modes and their meanings: false/0 = Only fields required by the user or the connector are materialized. 1 = Only top-level fields are selected. 2 = Second-level fields are selected, or top-level fields having no children. 3, 4, ... = Further levels of nesting are selected. true = Select nested fields regardless of their depth.

exclude Field[]

This removes from recommended projections, where enabled.

groupBy Field[]

If not specified, the key of the source collection is used. Materialization bindings may select an ordered subset of scalar fields which will be grouped over, resulting in a natural index over the chosen group-by key. Fields may or may not be part of the collection key.

require object

This supplements any selected fields, where enabled. Values are passed to and interpreted by the connector, which may use it to customize DDL generation or other behaviors with respect to the field. Consult connector documentation to see what it supports.

Note that this field has been renamed from include, which will continue to be accepted as an alias.

OnIncompatibleSchemaChange string | string | string | string

Determines how to handle incompatible schema changes for a given binding.

Examples:

"backfill"

PartitionSelector object

Partition selectors identify a desired subset of the available logical partitions of a collection.

Examples:

{ "exclude": { "other_partition": [ 32, 64 ] }, "include": { "a_partition": [ "A", "B" ] } }

exclude Record<string, array>

Partition field names and values which are excluded from the source collection. Any documents matching any one of the partition values will be excluded.

Default:

{}

include Record<string, array>

Partition field names and corresponding values which must be matched from the Source collection. Only documents having one of the specified values across all specified partition names will be matched. For example, source: [App, Web] region: [APAC] would mean only documents of 'App' or 'Web' source and also occurring in the 'APAC' region will be processed.

Default:

{}

Projection JsonPointer | object

Projections are named locations within a collection document which may be used for logical partitioning or directly exposed to databases into which collections are materialized.

RecommendedDepth boolean | integer

Available selection modes and their meanings: false/0 = Only fields required by the user or the connector are materialized. 1 = Only top-level fields are selected. 2 = Second-level fields are selected, or top-level fields having no children. 3, 4, ... = Further levels of nesting are selected. true = Select nested fields regardless of their depth.

RelativeUrl string

A URL identifying a resource, which may be a relative local path with respect to the current resource (i.e, ../path/to/flow.yaml), or may be an external absolute URL (i.e., http://example/flow.yaml).

Examples:

"https://example/resource"

Schema

ShardTemplate object

A ShardTemplate configures how shards process a catalog task.

Examples:

{ "hotStandbys": 1, "maxTxnDuration": "30s" }

disable boolean

flags object

Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.'). Each flag produces a shard label estuary.dev/flag/<name>.

hotStandbys integer

Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.

format=uint32min=0

logLevel string

Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".

maxTxnDuration string | null

This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

minTxnDuration string | null

This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.

pattern=^\d+(s|m|h|d)$

readChannelSize integer

Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

ringBufferSize integer

The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.

format=uint32min=0

Shuffle string | object | object

A Shuffle specifies how a shuffling key is to be extracted from collection documents.

Examples:

{ "key": [ "/json/ptr" ] }

ShuffleType string

Type of a shuffled key component.

Source FullSource | Collection

A source collection and details of how it's read.

Examples:

"source/collection"

SourceDef object

Specifies configuration for source captures, and defaults for new bindings that are added to the materialization. Changing these defaults has no effect on existing bindings.

capture string required

Capture names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples: "acmeCo/capture"

pattern=^[\p{Letter}\p{Number}\-_\.]+(/[\p{Letter}\p{Number}\-_\.]+)*$

deltaUpdates boolean

New bindings will apply this as their delta-updates setting.

Default: false

fieldsRecommended boolean | integer

Available selection modes and their meanings: false/0 = Only fields required by the user or the connector are materialized. 1 = Only top-level fields are selected. 2 = Second-level fields are selected, or top-level fields having no children. 3, 4, ... = Further levels of nesting are selected. true = Select nested fields regardless of their depth.

targetNaming string | string | string | string

How to name target resources (database tables, for example) for materializing a given Collection.

SourceType Capture | SourceDef

TargetNaming string | string | string | string

How to name target resources (database tables, for example) for materializing a given Collection.

TestDef object

Test the behavior of reductions and derivations, through a sequence of test steps.

Examples:

{ "description": "An example test", "steps": [ { "ingest": { "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] } }, { "verify": { "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] } } ] }

steps TestStep[] required

delete boolean

When true, a publication will delete this test.

description string

expectPubId string

TestDocuments

TestStep object | object

A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.

Examples:

{ "ingest": { "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] } }
{ "verify": { "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] } }

TestStepIngest object

An ingestion test step ingests document fixtures into the named collection.

Examples:

{ "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] }

collection string required

Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.

Examples: "acmeCo/collection"

pattern=^[\p{Letter}\p{Number}\-_\.]+(/[\p{Letter}\p{Number}\-_\.]+)*$

documents Value required

A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.

Examples: "../path/to/test-documents.json", [{"a":"document"},{"another":"document"}]

description string

TestStepVerify object

A verification test step verifies that the contents of the named collection match the expected fixtures, after fully processing all preceding ingestion test steps.

Examples:

{ "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] }

collection FullSource | Collection required

A source collection and details of how it's read.

Examples: "source/collection"

documents Value required

A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.

Examples: "../path/to/test-documents.json", [{"a":"document"},{"another":"document"}]

description string

Transform string

Transform names are Unicode letters, numbers, '-', '_', or '.'.

Examples:

"myTransform"

TransformDef object

A Transform reads and shuffles documents of a source collection, and processes each document through either one or both of a register "update" lambda and a derived document "publish" lambda.

Examples:

{ "name": "my-transform", "shuffle": "any", "source": "some/source/collection" }

name string required

Transform names are Unicode letters, numbers, '-', '_', or '.'.

Examples: "myTransform"

pattern=^[\p{Letter}\p{Number}\-_\.]+$

shuffle string | object | object required

A Shuffle specifies how a shuffling key is to be extracted from collection documents.

Examples: {"key":["/json/ptr"]}

source FullSource | Collection required

A source collection and details of how it's read.

Examples: "source/collection"

backfill integer

Every increment of this counter will result in a new backfill of this transform. Specifically, the transform's lambda will be re-invoked for every applicable document of its source collection.

Note that a backfill does not truncate the derived collection, and documents published by a backfilled transform will coexist with (and be ordered after) any documents which were published as part of a preceding backfill.

format=uint32min=0

disable boolean

Disabled transforms are completely ignored at runtime and are not validated.

lambda

priority integer

When all transforms are of equal priority, Flow processes documents according to their associated publishing time, as encoded in the document UUID.

However, when one transform has a higher priority than others, then all ready documents are processed through the transform before any documents of other transforms are processed.

format=uint32min=0

readDelay string | null

Delays are applied as an adjustment to the UUID clock encoded within each document, which is then used to impose a relative ordering of all documents read by this derivation. This means that read delays are applied in a consistent way, even when back-filling over historical documents. When caught up and tailing the source collection, delays also "gate" documents such that they aren't processed until the current wall-time reflects the delay.

pattern=^\d+(s|m|h|d)$

Value

Estuary Flow Catalog

Validate with Lintel

Properties

Definitions