Estuary Flow Catalog
Flow catalog files. Documentation: https://github.com/estuary/flow
| Type | object |
|---|---|
| File match |
flow.yaml
*.flow.yaml
flow.json
*.flow.json
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/estuary-flow-catalog/latest.json |
| Source | https://raw.githubusercontent.com/estuary/flow/master/flow.schema.json |
Validate with Lintel
npx @lintel/lintel check
Each catalog source defines a portion of a Flow Catalog, by defining collections, derivations, tests, and materializations of the Catalog. Catalog sources may reference and import other sources, in order to collections and other entities that source defines.
Properties
By importing another Flow catalog source, its collections, schemas, and derivations are bundled into the publication context of this specification. Imports are relative or absolute URLs, relative to this specification's location.
Definitions
Settings to determine how Flow should stay abreast of ongoing changes to collections and schemas.
Automatically add new bindings discovered from the source.
Whether to automatically evolve collections and/or materialization bindings to handle changes to collections that would otherwise be incompatible with the existing catalog.
Capture names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
"acmeCo/capture"
{ "resource": { "stream": "a_stream" }, "target": "target/collection" }
Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
Every increment of this counter will result in a new backfill of this binding from the captured endpoint. For example when capturing from a SQL table, incrementing this counter will cause the table to be re-captured in its entirety from the source database.
Note that a backfill does not truncate the target collection, and documents published by a backfilled binding will coexist with (and be ordered after) any documents which were published as part of a preceding backfill.
Disabled bindings are inactive, and not validated. They can be used to represent discovered resources that are intentionally not being captured.
A Capture binds an external system and target (e.x., a SQL table or cloud storage bucket) from which data should be continuously captured, with a Flow collection into that captured data is ingested. Multiple Captures may be bound to a single collection, but only one capture may exist for a given endpoint and target.
An endpoint from which Flow will capture.
Settings to determine how Flow should stay abreast of ongoing changes to collections and schemas.
2 nested properties
Automatically add new bindings discovered from the source.
Whether to automatically evolve collections and/or materialization bindings to handle changes to collections that would otherwise be incompatible with the existing catalog.
When true, a publication will delete this capture.
Configured intervals are applicable only to connectors which are unable to continuously tail their source, and which instead produce a current quantity of output and then exit. Flow will start the connector again after the given interval of time has passed.
Intervals are relative to the start of an invocation and not its completion. For example, if the interval is five minutes, and an invocation of the capture finishes after two minutes, then the next invocation will be started after three additional minutes.
When provided, this base64-encoded salt is used instead of a generated one.
Publishing a value of true will reset this capture, which is
equivalent to deleting and then re-creating the capture but
applied as a single publication.
A ShardTemplate configures how shards process a catalog task.
8 nested properties
Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.').
Each flag produces a shard label estuary.dev/flag/<name>.
Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.
Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".
This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.
This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.
Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.
The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.
An endpoint from which Flow will capture.
Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
"acmeCo/collection"
Collection describes a set of related documents, where each adheres to a common schema and grouping key. Collections are append-only: once a document is added to a collection, it is never removed. However, it may be replaced or updated (either in whole, or in part) by a future document sharing its key. Each new document of a given key is "reduced" into existing documents of the key. By default, this reduction is achieved by completely replacing the previous document, but much richer reduction behaviors can be specified through the use of annotated reduction strategies of the collection schema.
{ "key": [ "/json/ptr" ], "schema": { "properties": { "bar": { "const": 42 }, "foo": { "type": "integer" } }, "type": "object" } }
Ordered JSON-Pointers which define how a composite key may be extracted from a collection document.
When true, a publication will delete this collection.
Derive specifies how a collection is derived from other collections.
5 nested properties
A derivation runtime implementation.
When provided, this base64-encoded salt is used instead of a generated one.
A ShardTemplate configures how shards process a catalog task.
8 nested properties
Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.').
Each flag produces a shard label estuary.dev/flag/<name>.
Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.
Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".
This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.
This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.
Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.
The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.
Typically you omit this and Flow infers it from your transform shuffle keys. In some circumstances, Flow may require that you explicitly tell it of your shuffled key types.
A JournalTemplate configures the journals which make up the physical partitions of a collection.
1 nested properties
A FragmentTemplate configures how journal fragment files are produced as part of a collection.
4 nested properties
A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.
into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.
When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.
If not set, then fragments are retained indefinitely.
A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.
Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.
For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.
Publishing a value of true will reset this collection, which is
equivalent to deleting and then re-creating the collection but
applied as a single publication.
If this is a derived collection then the derivation task state is
also reset, effectively re-building the derivation in its entirety.
A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.
Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.
For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.
A schema is a draft 2020-12 JSON Schema which validates Flow documents. Schemas also provide annotations at document locations, such as reduction strategies for combining one document into another.
Schemas may be defined inline to the catalog, or given as a relative or absolute URI. URIs may optionally include a JSON fragment pointer that locates a specific sub-schema therein.
For example, "schemas/marketing.yaml#/$defs/campaign" would reference the schema at location {"$defs": {"campaign": ...}} within ./schemas/marketing.yaml.
Ordered JSON-Pointers which define how a composite key may be extracted from a collection document.
[ "/json/ptr" ]
A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.
"GZIP_OFFLOAD_DECOMPRESSION"
Connector image and configuration specification.
Dekaf service configuration
Since we support integrating with a bunch of different providers via Dekaf, this allows us to store which of those connector variants this particular Dekaf connector was created as, in order to e.g link to the correct docs URL, show the correct name and logo, etc.
Derive specifies how a collection is derived from other collections.
A derivation runtime implementation.
When provided, this base64-encoded salt is used instead of a generated one.
A ShardTemplate configures how shards process a catalog task.
8 nested properties
Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.').
Each flag produces a shard label estuary.dev/flag/<name>.
Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.
Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".
This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.
This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.
Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.
The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.
Typically you omit this and Flow infers it from your transform shuffle keys. In some circumstances, Flow may require that you explicitly tell it of your shuffled key types.
A derivation runtime implementation.
Module is either a relative URL of a Python module file, or is an inline representation of a Python module. The module must have an exported Derivation class which extends the generated IDerivation base class.
Map of package name to version specifier (e.g., {"httpx": ">=0.27", "pydantic": ">=2.0"}). These dependencies will be included in the generated pyproject.toml.
{}
Migrations may be provided as an inline string, or as a relative URL to a file containing the migration SQL.
Module is either a relative URL of a TypeScript module file, or is an inline representation of a Typescript module. The module must have a exported Derivation variable which is an instance implementing the corresponding Derivation interface.
Field names a projection of a document location. They may include '/', but cannot begin or end with one. Many Fields are automatically inferred by Flow from a collection JSON Schema, and are the JSON Pointer of the document location with the leading '/' removed. User-provided Fields which act as a logical partitions are restricted to Unicode letters, numbers, '-', '_', or '.'
"my_field"
A FragmentTemplate configures how journal fragment files are produced as part of a collection.
{ "compressionCodec": "ZSTANDARD", "flushInterval": "1h" }
A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.
into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.
When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.
If not set, then fragments are retained indefinitely.
A source collection and details of how it's read.
{ "name": "source/collection" }
Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
Source collection documents published after this date-time are filtered.
notAfter is only a filter. Updating its value will not cause Flow
to re-process documents that have already been read.
Optional. Default is to process all documents.
Source collection documents published before this date-time are filtered.
notBefore is only a filter. Updating its value will not cause Flow
to re-process documents that have already been read.
Optional. Default is to process all documents.
Partition selectors identify a desired subset of the available logical partitions of a collection.
2 nested properties
Partition field names and values which are excluded from the source collection. Any documents matching any one of the partition values will be excluded.
{}
Partition field names and corresponding values which must be matched from the Source collection. Only documents having one of the specified values across all specified partition names will be matched. For example, source: [App, Web] region: [APAC] would mean only documents of 'App' or 'Web' source and also occurring in the 'APAC' region will be processed.
{}
A JournalTemplate configures the journals which make up the physical partitions of a collection.
{ "fragments": { "compressionCodec": "ZSTANDARD", "flushInterval": "1h" } }
A FragmentTemplate configures how journal fragment files are produced as part of a collection.
4 nested properties
A CompressionCodec may be applied to compress journal fragments before they're persisted to cloud stoage. The compression applied to a journal fragment is included in its filename, such as ".gz" for GZIP. A collection's compression may be changed at any time, and will affect newly-written journal fragments.
into cloud storage. Intervals are converted into uniform time segments: 24h will "roll" all fragments at midnight UTC every day, 1h at the top of every hour, 15m a :00, :15, :30, :45 past the hour, and so on. If not set, then fragments are not flushed on time-based intervals.
When a collection journal fragment reaches this threshold, it will be closed off and pushed to cloud storage. If not set, a default of 512MB is used.
If not set, then fragments are retained indefinitely.
JSON Pointer which identifies a location in a document.
"/json/ptr"
Local command and its configuration.
{ "fields": { "recommended": true }, "resource": { "table": "a_table" }, "source": "source/collection" }
A source collection and details of how it's read.
Every increment of this counter will result in a new backfill of this binding from its source collection to its materialized resource. For example when materializing to a SQL table, incrementing this counter causes the table to be dropped and then rebuilt by re-reading the source collection.
Disabled bindings are inactive, and not validated.
MaterializationFields defines a selection of projections to materialize, as well as optional per-projection, driver-specific configuration.
4 nested properties
Available selection modes and their meanings:
false/0 = Only fields required by the user or the connector are materialized.
1 = Only top-level fields are selected.
2 = Second-level fields are selected, or top-level fields having no children.
3, 4, ... = Further levels of nesting are selected.
true = Select nested fields regardless of their depth.
This removes from recommended projections, where enabled.
If not specified, the key of the source collection is used. Materialization bindings may select an ordered subset of scalar fields which will be grouped over, resulting in a natural index over the chosen group-by key. Fields may or may not be part of the collection key.
This supplements any selected fields, where enabled. Values are passed to and interpreted by the connector, which may use it to customize DDL generation or other behaviors with respect to the field. Consult connector documentation to see what it supports.
Note that this field has been renamed from include,
which will continue to be accepted as an alias.
Determines how to handle incompatible schema changes for a given binding.
When all bindings are of equal priority, Flow processes documents according to their associated publishing time, as encoded in the document UUID.
However, when one binding has a higher priority than others, then all ready documents are processed through the binding before any documents of other bindings are processed.
A Materialization binds a Flow collection with an external system & target (e.x, a SQL table) into which the collection is to be continuously materialized.
An Endpoint connector used for Flow materializations.
When true, a publication will delete this materialization.
Determines how to handle incompatible schema changes for a given binding.
Publishing a value of true will reset this materialization, which is
equivalent to deleting and then re-creating the materialization but
applied as a single publication.
A ShardTemplate configures how shards process a catalog task.
8 nested properties
Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.').
Each flag produces a shard label estuary.dev/flag/<name>.
Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.
Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".
This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.
This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.
Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.
The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.
An Endpoint connector used for Flow materializations.
MaterializationFields defines a selection of projections to materialize, as well as optional per-projection, driver-specific configuration.
{ "exclude": [ "removed" ], "recommended": true, "require": { "added": {} } }
Available selection modes and their meanings:
false/0 = Only fields required by the user or the connector are materialized.
1 = Only top-level fields are selected.
2 = Second-level fields are selected, or top-level fields having no children.
3, 4, ... = Further levels of nesting are selected.
true = Select nested fields regardless of their depth.
This removes from recommended projections, where enabled.
If not specified, the key of the source collection is used. Materialization bindings may select an ordered subset of scalar fields which will be grouped over, resulting in a natural index over the chosen group-by key. Fields may or may not be part of the collection key.
This supplements any selected fields, where enabled. Values are passed to and interpreted by the connector, which may use it to customize DDL generation or other behaviors with respect to the field. Consult connector documentation to see what it supports.
Note that this field has been renamed from include,
which will continue to be accepted as an alias.
Determines how to handle incompatible schema changes for a given binding.
"backfill"
Partition selectors identify a desired subset of the available logical partitions of a collection.
{ "exclude": { "other_partition": [ 32, 64 ] }, "include": { "a_partition": [ "A", "B" ] } }
Partition field names and values which are excluded from the source collection. Any documents matching any one of the partition values will be excluded.
{}
Partition field names and corresponding values which must be matched from the Source collection. Only documents having one of the specified values across all specified partition names will be matched. For example, source: [App, Web] region: [APAC] would mean only documents of 'App' or 'Web' source and also occurring in the 'APAC' region will be processed.
{}
Projections are named locations within a collection document which may be used for logical partitioning or directly exposed to databases into which collections are materialized.
Available selection modes and their meanings:
false/0 = Only fields required by the user or the connector are materialized.
1 = Only top-level fields are selected.
2 = Second-level fields are selected, or top-level fields having no children.
3, 4, ... = Further levels of nesting are selected.
true = Select nested fields regardless of their depth.
A URL identifying a resource, which may be a relative local path with respect to the current resource (i.e, ../path/to/flow.yaml), or may be an external absolute URL (i.e., http://example/flow.yaml).
"https://example/resource"
A ShardTemplate configures how shards process a catalog task.
{ "hotStandbys": 1, "maxTxnDuration": "30s" }
Flag names must be valid tokens (Unicode letters, numbers, '-', '_', '.').
Each flag produces a shard label estuary.dev/flag/<name>.
Hot standbys of a shard actively replicate the shard's state to another machine, and are able to be quickly promoted to take over processing for the shard should its current primary fail. If not set, then no hot standbys are maintained. EXPERIMENTAL: this field MAY be removed.
Log levels may currently be "error", "warn", "info", "debug", or "trace". If not set, the effective log level is "info".
This duration upper-bounds the amount of time during which a transaction may process documents before it must flush and commit. It may run for less time if there aren't additional ready documents for it to process. If not set, the maximum duration defaults to five minutes for materializations, and one second for captures and derivations. EXPERIMENTAL: this field MAY be removed.
This duration lower-bounds the amount of time during which a transaction must process documents before it must flush and commit. It may run for more time if additional documents are available. The default value is zero seconds. Larger values may result in more data reduction, at the cost of more latency. EXPERIMENTAL: this field MAY be removed.
Larger values are recommended for tasks having more than one shard split and long, bursty transaction durations. If not set, a reasonable default (currently 4,096) is used. EXPERIMENTAL: this field is LIKELY to be removed.
The ring buffer is a performance optimization only: catalog tasks will replay portions of journals as needed when messages aren't available in the buffer. It can remain small if upstream task transactions are small, but larger transactions will achieve better performance with a larger ring. If not set, a reasonable default (currently 65,536) is used. EXPERIMENTAL: this field is LIKELY to be removed.
A Shuffle specifies how a shuffling key is to be extracted from collection documents.
{ "key": [ "/json/ptr" ] }
Type of a shuffled key component.
A source collection and details of how it's read.
"source/collection"
Specifies configuration for source captures, and defaults for new bindings that are added to the materialization. Changing these defaults has no effect on existing bindings.
Capture names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
New bindings will apply this as their delta-updates setting.
Available selection modes and their meanings:
false/0 = Only fields required by the user or the connector are materialized.
1 = Only top-level fields are selected.
2 = Second-level fields are selected, or top-level fields having no children.
3, 4, ... = Further levels of nesting are selected.
true = Select nested fields regardless of their depth.
How to name target resources (database tables, for example) for materializing a given Collection.
How to name target resources (database tables, for example) for materializing a given Collection.
Test the behavior of reductions and derivations, through a sequence of test steps.
{ "description": "An example test", "steps": [ { "ingest": { "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] } }, { "verify": { "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] } } ] }
When true, a publication will delete this test.
A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.
{ "ingest": { "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] } }{ "verify": { "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] } }
An ingestion test step ingests document fixtures into the named collection.
{ "collection": "acmeCo/collection", "description": "Description of the ingestion.", "documents": [ { "a": "document" }, { "another": "document" } ] }
Collection names are paths of Unicode letters, numbers, '-', '_', or '.'. Each path component is separated by a slash '/', and a name may not begin or end in a '/'.
A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.
A verification test step verifies that the contents of the named collection match the expected fixtures, after fully processing all preceding ingestion test steps.
{ "collection": "acmeCo/collection", "description": "Description of the verification.", "documents": [ { "a": "document" }, { "another": "document" } ] }
A source collection and details of how it's read.
A test step describes either an "ingest" of document fixtures into a collection, or a "verify" of expected document fixtures from a collection.
Transform names are Unicode letters, numbers, '-', '_', or '.'.
"myTransform"
A Transform reads and shuffles documents of a source collection, and processes each document through either one or both of a register "update" lambda and a derived document "publish" lambda.
{ "name": "my-transform", "shuffle": "any", "source": "some/source/collection" }
Transform names are Unicode letters, numbers, '-', '_', or '.'.
A Shuffle specifies how a shuffling key is to be extracted from collection documents.
A source collection and details of how it's read.
Every increment of this counter will result in a new backfill of this transform. Specifically, the transform's lambda will be re-invoked for every applicable document of its source collection.
Note that a backfill does not truncate the derived collection, and documents published by a backfilled transform will coexist with (and be ordered after) any documents which were published as part of a preceding backfill.
Disabled transforms are completely ignored at runtime and are not validated.
When all transforms are of equal priority, Flow processes documents according to their associated publishing time, as encoded in the document UUID.
However, when one transform has a higher priority than others, then all ready documents are processed through the transform before any documents of other transforms are processed.
Delays are applied as an adjustment to the UUID clock encoded within each document, which is then used to impose a relative ordering of all documents read by this derivation. This means that read delays are applied in a consistent way, even when back-filling over historical documents. When caught up and tailing the source collection, delays also "gate" documents such that they aren't processed until the current wall-time reflects the delay.