Type	object
File match	`spicepod.yml` `spicepod.yaml`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/spicepod-yaml/latest.json
Source	https://raw.githubusercontent.com/spiceai/spiceai/trunk/.schema/spicepod.schema.json

SpicepodVersion string

SpicepodKind string

Runtime object

Helper struct for deserializing Runtime with custom logic for handling memory_limit/temp_directory deprecation

results_cache ResultsCache | null

Default: null

caching Caching | null

Default: null

dataset_load_parallelism integer | null

format=uintmin=0

tls TlsConfig | null

If set, the runtime will configure all endpoints to use TLS

tracing TracingConfig | null

telemetry object

5 nested properties

enabled boolean

Default: true

user_agent_collection string

Values: "full" "disabled"

properties Record<string, string>

Custom key/value attributes attached to telemetry metrics emitted by spiced. Applied as OpenTelemetry resource attributes on the runtime's MeterProvider, so they appear as dimensions on every metric exported via the Prometheus scrape endpoint, the cluster on-demand OTLP reader, and the otel_exporter push exporter, and as labels on anonymous usage telemetry. Currently does not affect tracing spans or logs. Example: { environment: prod, region: us-west-2, team: data-platform }.

Default:

{}

metric_prefix string | null

Optional prefix prepended to every exported metric name.

Useful for namespacing Spice metrics in shared backends (e.g. Datadog, Grafana Cloud) so they don't collide with metrics from other services. For example, with metric_prefix: "spiceai." the runtime metric query_duration_ms is exported as spiceai.query_duration_ms.

The prefix is applied via an OpenTelemetry View on the runtime's MeterProvider, so it affects every metric reader attached to that provider — the Prometheus scrape endpoint (--metrics), the cluster on-demand OTLP reader, and the otel_exporter push reader. OpenTelemetry 0.31's SDK does not support per-reader name transforms, so this knob is intentionally placed at the telemetry level rather than under any single exporter.

otel_exporter OtelExporterConfig | null

Optional configuration for pushing metrics to an OpenTelemetry collector

params Record<string, string>

task_history object

8 nested properties

enabled boolean

Default: true

captured_output string

Default: "none"

captured_context string

Default: "truncated"

Values: "redacted" "truncated" "full"

retention_period string

Default: "8h"

retention_check_interval string

Default: "15m"

min_sql_duration string | null

captured_plan string | null

min_plan_duration string | null

auth Auth | null

cors object

2 nested properties

enabled boolean

Default: false

allowed_origins string[]

Default:

[
  "*"
]

flight Flight | null

temp_directory string | null

Configures where the runtime will store temporary files needed for operations like spilling to disk for queries & accelerations that are larger than memory.

memory_limit string | null

Specifies the runtime memory limit. When configured, will spill to disk for supported queries larger than memory.

shutdown_timeout string | null

Configures how long the runtime waits for connections to be gracefully drained and components to shut down cleanly during runtime termination

ready_state string | string

Controls when the runtime readiness probe reports the runtime as ready.

output_level OutputLevel | null

Configures log level for the runtime. Can be overriden if flags or environment variables are set.

query Query | null

metrics Metrics | null

scheduler Scheduler | null

ResultsCache object

enabled boolean

Default: true

cache_max_size string | null

item_ttl string | null

caching_policy string | string

cache_key_type string

Values: "plan" "sql"

hashing_algorithm string

Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"

engine string | string

max_stale_while_revalidate string | null

Maximum stale-while-revalidate duration to add to the cache TTL.

eviction_policy string | string

CachingPolicy string | string

CacheKeyType string

HashingAlgorithm string

CacheEngine string | string

Caching object

sql_results SQLResultsCacheConfig | null

search_results CacheConfig | null

embeddings CacheConfig | null

SQLResultsCacheConfig object

enabled boolean

Default: true

max_size string | null

item_ttl string | null

caching_policy string | string

hashing_algorithm string

Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"

cache_key_type string

Values: "plan" "sql"

engine string | string

stale_while_revalidate_ttl string | null

Maximum age for serving stale cached results while revalidating in the background. When set, cached results past their TTL (but within this additional window) will be served immediately while a background refresh is triggered. Format: duration string (e.g., "30s", "5m"). This is a response directive.

encoding string

Values: "none" "zstd"

eviction_policy string | string

Encoding string

CacheConfig object

enabled boolean

Default: true

max_size string | null

item_ttl string | null

caching_policy string | string

hashing_algorithm string

Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"

engine string | string

eviction_policy string | string

TlsConfig object

enabled boolean required

If set, the runtime will configure all endpoints to use TLS

certificate_file string | null

A filesystem path to a file containing the PEM encoded certificate

certificate string | null

A PEM encoded certificate

key_file string | null

A filesystem path to a file containing the PEM encoded private key

key string | null

A PEM encoded private key

TracingConfig object

zipkin_enabled boolean required

zipkin_endpoint string | null

TelemetryConfig object

enabled boolean

Default: true

user_agent_collection string

Values: "full" "disabled"

properties Record<string, string>

Custom key/value attributes attached to telemetry metrics emitted by spiced. Applied as OpenTelemetry resource attributes on the runtime's MeterProvider, so they appear as dimensions on every metric exported via the Prometheus scrape endpoint, the cluster on-demand OTLP reader, and the otel_exporter push exporter, and as labels on anonymous usage telemetry. Currently does not affect tracing spans or logs. Example: { environment: prod, region: us-west-2, team: data-platform }.

Default:

{}

metric_prefix string | null

Optional prefix prepended to every exported metric name.

Useful for namespacing Spice metrics in shared backends (e.g. Datadog, Grafana Cloud) so they don't collide with metrics from other services. For example, with metric_prefix: "spiceai." the runtime metric query_duration_ms is exported as spiceai.query_duration_ms.

The prefix is applied via an OpenTelemetry View on the runtime's MeterProvider, so it affects every metric reader attached to that provider — the Prometheus scrape endpoint (--metrics), the cluster on-demand OTLP reader, and the otel_exporter push reader. OpenTelemetry 0.31's SDK does not support per-reader name transforms, so this knob is intentionally placed at the telemetry level rather than under any single exporter.

otel_exporter OtelExporterConfig | null

Optional configuration for pushing metrics to an OpenTelemetry collector

UserAgentCollection string

OtelExporterConfig object

Configuration for pushing metrics to an OpenTelemetry collector.

The protocol is inferred from the endpoint:

HTTP: When endpoint has <http://> or <https://> scheme, or contains /v1/metrics
gRPC: When endpoint is just a hostname and optional port (defaults to 4317)

Examples

gRPC (hostname only, port defaults to 4317):

otel_exporter:
  enabled: true
  endpoint: "otel-collector"

With metric whitelist:

otel_exporter:
  enabled: true
  endpoint: "otel-collector:4317"
  metrics:
    - requests_total
    - request_duration_seconds

HTTP:

otel_exporter:
  enabled: true
  endpoint: "<http://localhost:4318/v1/metrics>"

endpoint string required

The endpoint of the OTEL collector.

For gRPC: use hostname with optional port (e.g., otel-collector or localhost:4317) For HTTP: use full URL (e.g., <http://localhost:4318/v1/metrics>)

enabled boolean

Whether the OTEL exporter is enabled

Default: true

push_interval string

How often to push metrics to the collector (e.g., "30s", "1m", "5m")

Default: "60s"

metrics string[]

Optional whitelist of metric names to export. If not specified or empty, all metrics are exported.

headers Record<string, string>

Optional headers to send with each export request. For HTTP: sent as HTTP headers. For gRPC: sent as metadata entries (keys must be lowercase ASCII, e.g. use authorization not Authorization). Values support secret replacement syntax (e.g., ${secrets:api_key}).

temporality string | string | string

Aggregation temporality preference for the OTEL metrics push exporter.

Controls how counter and histogram values are encoded on the wire:

Delta (default): each export contains the change since the previous export. Required by Datadog's OTLP intake and recommended by AWS CloudWatch, New Relic, and most push-based SaaS backends. Aligns with the OpenTelemetry guidance for push exporters.
Cumulative: each export carries the running total since process start. Use this for OTel collectors that downstream into Prometheus or other pull-based / cumulative-native backends.
LowMemory: counters use cumulative, histograms use delta. Reduces the SDK's in-process state for histogram-heavy workloads.

This setting only affects the OTLP push exporter; the runtime's Prometheus scrape endpoint always exposes cumulative metrics regardless of this value.

OtelTemporality string | string | string

Aggregation temporality preference for the OTEL metrics push exporter.

Controls how counter and histogram values are encoded on the wire:

Delta (default): each export contains the change since the previous export. Required by Datadog's OTLP intake and recommended by AWS CloudWatch, New Relic, and most push-based SaaS backends. Aligns with the OpenTelemetry guidance for push exporters.
Cumulative: each export carries the running total since process start. Use this for OTel collectors that downstream into Prometheus or other pull-based / cumulative-native backends.
LowMemory: counters use cumulative, histograms use delta. Reduces the SDK's in-process state for histogram-heavy workloads.

This setting only affects the OTLP push exporter; the runtime's Prometheus scrape endpoint always exposes cumulative metrics regardless of this value.

TaskHistory object

enabled boolean

Default: true

captured_output string

Default: "none"

captured_context string

Default: "truncated"

Values: "redacted" "truncated" "full"

retention_period string

Default: "8h"

retention_check_interval string

Default: "15m"

min_sql_duration string | null

captured_plan string | null

min_plan_duration string | null

Auth object

api_key ApiKeyAuth | null

ApiKeyAuth object

keys ApiKey[] required

enabled boolean

Default: true

ApiKey object | object

API key for authentication. Keys can be read-only or read-write. The key value is redacted in Debug output to prevent credential leakage.

All comparisons (both ApiKey to ApiKey and ApiKey to &str) use constant-time comparison via the subtle crate to prevent timing attacks.

CorsConfig object

enabled boolean

Default: false

allowed_origins string[]

Default:

[
  "*"
]

Flight object

max_message_size string | null

do_put_rate_limit_enabled boolean

Whether to enable rate limiting on Flight DoPut (write) requests. Defaults to true. Set to false to disable write rate limiting for bulk ingest workloads.

Default: true

RuntimeReadyState string | string

Controls when the runtime readiness probe reports the runtime as ready.

OutputLevel string

Query object

memory_limit string | null

Specifies the runtime memory limit. When configured, will spill to disk for supported queries larger than memory.

temp_directory string | null

Configures where the runtime will store temporary files needed for operations like spilling to disk for queries & accelerations that are larger than memory.

spill_compression SpillCompression | null

Specifies the compression codec used when spilling data to disk.

SpillCompression string

Metrics object

metrics Metric[] required

Metric object

name string required

enabled boolean

Default: true

Scheduler object

state_location string required

Root URI for shared cluster state.

params Params | null

Optional object store params for the shared cluster state.

partition_management PartitionManagement | null

Partition management configuration

Params Record<string, string | integer | number | boolean>

ParamValue string | integer | number | boolean

PartitionManagement object

interval string

Default: "30s"

max_assignments_per_cycle integer

Default: 100

format=uintmin=0

max_partitions_per_executor integer

Default: 1000

format=uintmin=0

discovery_timeout string

Default: "60s"

Management object

api_key string required

enabled boolean

Default: true

params Record<string, string>

Snapshots object

Datasets accelerated using a file-mode acceleration engine (i.e. Sqlite or DuckDB) can bootstrap from a DB file on object storage (i.e. S3) if the acceleration file does not exist on startup using this configuration.

Each dataset needs to opt-in for snapshots in addition to this config.

enabled boolean

Global enable/disable for dataset snapshots.

Default: true

location string | null

The object store location pointing to a folder containing the dataset snapshots. i.e. s3://my-bucket/spice/snapshots/

bootstrap_on_failure_behavior string

Values: "warn" "retry" "fallback"

params Params | null

Auth params for accessing the object store location. For S3, this is the same as the S3 dataset connector params with the notable exception that s3_auth is set to iam_role by default.

BootstrapOnFailureBehavior string

Extension object

enabled boolean

Default: true

params Record<string, string>

Secret object

The secrets configuration for a Spicepod.

Example:

secrets:
  - from: env
    name: env
  - from: kubernetes:my_secret_name
    name: k8s

from string required

name string required

description string | null

params Params | null

ComponentOrReference Catalog | ComponentReference

Catalog

A catalog definition. The params field is validated based on the catalog connector type specified in 'from'.

AccessMode string | string

ComponentReference object

ref string required

dependsOn string[]

ComponentOrReference2 Dataset | ComponentReference

Dataset

A dataset definition. The params field is validated based on the connector type specified in 'from'.

Column object

name string required

description string | null

Optional semantic details about the column

embeddings ColumnLevelEmbeddingConfig[]

full_text_search FullTextSearchConfig | null

metadata object

ColumnLevelEmbeddingConfig object

Configuration for if and how a dataset's column should be embedded. Different to [crate::component::embeddings::ColumnEmbeddingConfig], as [ColumnLevelEmbeddingConfig] should be a property of [Column], not [super::Dataset].

[crate::component::embeddings::ColumnEmbeddingConfig] will be deprecated long term in favour of [ColumnLevelEmbeddingConfig].

from string

Default: ""

chunking EmbeddingChunkConfig | null

row_id array | null

vector_size integer | null

format=uintmin=0

aggregation EmbeddingAggregation | null

Aggregation strategy for multi-vector embeddings. Only meaningful when the underlying column is list-typed. Defaults to max.

max_elements_per_row integer | null

Maximum number of list elements embedded per row for multi-vector columns. Defaults to 32; hard-capped at 1024.

format=uintmin=0

EmbeddingChunkConfig object

enabled boolean

Default: false

target_chunk_size integer

Default: 0

format=uintmin=0

overlap_size integer

Default: 0

format=uintmin=0

trim_whitespace boolean

Default: false

EmbeddingAggregation string

Aggregation strategy applied when a multi-vector (list-typed) column is queried. Each list element produces its own embedding; at query time the per-element similarities are combined into a single per-row score using this aggregation.

Max is the ColBERT-style MaxSim default — a row scores as high as its best-matching element.

FullTextSearchConfig object

enabled boolean required

row_id array | null

index_store IndexStore | null

index_directory string | null

IndexStore string

Replication object

enabled boolean

Default: false

TimeFormat string

Acceleration object

enabled boolean

Default: true

mode string | string | string | string

refresh_on_startup string | string

engine string | null

refresh_mode RefreshMode | null

refresh_check_interval string | null

refresh_cron string | null

refresh_sql string | null

refresh_data_window string | null

refresh_append_overlap string | null

refresh_retry_enabled boolean

Default: true

refresh_retry_max_attempts integer | null

format=uintmin=0

refresh_jitter_enabled boolean

Default: false

refresh_jitter_max string | null

params Params | null

Configuration parameters for the acceleration engine. The available parameters depend on the engine type specified in 'engine' (default: arrow). Available engines: arrow, cayenne, duckdb, postgres, sqlite, turso.

retention_period string | null

retention_sql string | null

retention_check_interval string | null

retention_check_enabled boolean

on_zero_results string | string

Behavior when a query on an accelerated table returns zero results.

ready_state ReadyState | null

Default: null

indexes Record<string, string>

primary_key string | null

on_conflict Record<string, string>

metrics Metrics | null

partition_by PartitionedBySchema[]

Partition expressions used to physically partition accelerated data.

Each item accepts either:

a plain expression string, for example "YEAR(created_at)" or "bucket(100, user_id)"; or
a single-entry mapping of a partition name to an expression, for example { year: "YEAR(created_at)" }.

snapshots string | string | string | string

snapshots_trigger SnapshotsTrigger | null

snapshots_trigger_threshold string | null

snapshots_compaction string

Values: "disabled" "enabled"

snapshots_reset_expiry_on_load string

Values: "disabled" "enabled"

snapshots_creation_policy string

Values: "always" "on_change"

Mode string | string | string | string

RefreshOnStartup string | string

RefreshMode string

ZeroResultsAction string | string

Behavior when a query on an accelerated table returns zero results.

ReadyState string | string | string

Controls when the dataset is marked ready for queries.

IndexType string

OnConflictBehavior string

PartitionedBySchema string | object

SnapshotBehavior string | string | string | string

SnapshotsTrigger string | string

SnapshotsCompaction string

SnapshotsResetExpiryOnLoad string

SnapshotsCreationPolicy string

ColumnEmbeddingConfig object

Configuration for if and how a dataset's column should be embedded.

Prefer to use [super::dataset::column::ColumnLevelEmbeddingConfig] going forward. Support for [ColumnEmbeddingConfig] will be removed in future.

column string required

use string

Default: ""

column_pk array | null

chunking EmbeddingChunkConfig | null

vector_size integer | null

format=uintmin=0

aggregation EmbeddingAggregation | null

Aggregation strategy for multi-vector embeddings. Only meaningful when the underlying column is list-typed (List<Utf8> / LargeList<Utf8>). Defaults to max (ColBERT-style MaxSim).

max_elements_per_row integer | null

Maximum number of list elements embedded per row for multi-vector columns. Defaults to 32; hard-capped at 1024. Excess elements are dropped with a warning log.

format=uintmin=0

InvalidTypeAction string

This is deprecated, use unsupported_type_action instead.

UnsupportedTypeAction string

VectorStore object

enabled boolean

Default: true

engine string | null

partition_by PartitionedBySchema[]

Partition expressions used to organize vector data.

Each item accepts either:

a plain expression string, for example "YEAR(created_at)" or "bucket(100, user_id)"; or
a single-entry mapping of a partition name to an expression, for example { year: "YEAR(created_at)" }.

params Params | null

CheckAvailability string | string

Controls whether the federated table periodically has its availability checked.

ComponentOrReference3 View | ComponentReference

View object

name string required

description string | null

metadata object

columns Column[]

sql string | null

Inline SQL that describes a view.

sql_ref string | null

Reference to a SQL file that describes a view.

acceleration Acceleration | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

vectors VectorStore | null

dependsOn string[]

ComponentOrReference4 Model | ComponentReference

Model

A model definition. The params field is validated based on the model source type specified in 'from'.

ModelFile object

path string required

name string | null

type ModelFileType | null

Should use [Self::file_type] to access.

params Record<string, string>

ModelFileType string

ComponentOrReference5 Embeddings | ComponentReference

Embeddings object

from string required

name string required

files ModelFile[]

params object

datasets string[]

dependsOn string[]

metrics Metrics | null

ComponentOrReference6 Tool | ComponentReference

Tool object

from string required

name string required

description string | null

params Record<string, string>

env Record<string, string>

dependsOn string[]

metrics Metrics | null

ComponentOrReference7 Worker | ComponentReference

Worker object

name string required

description string | null

params object

load_balance LoadBalanceParams | null

sql string | null

cron string | null

LoadBalanceParams object

routing RouterConfig[]

RouterConfig object | object | object

AbfsDataConnectorParams object

abfs_account string

Azure Storage account name.

abfs_container_name string

Azure Storage container name.

abfs_access_key string

Azure Storage account access key.

abfs_bearer_token string

Bearer token to use in Azure requests.

abfs_client_id string

Azure client ID.

abfs_client_secret string

Azure client secret.

abfs_tenant_id string

Azure tenant ID.

abfs_sas_string string

Azure SAS string.

abfs_endpoint string

Azure Storage endpoint.

abfs_use_emulator string

Use the Azure Storage emulator.

Default: "false"

Values: "true" "false"

abfs_use_fabric_endpoint string

Use the Azure Storage fabric endpoint.

Default: "false"

Values: "true" "false"

allow_http string

Allow insecure HTTP connections.

Default: "false"

Values: "true" "false"

abfs_authority_host string

Sets an alternative authority host.

abfs_max_retries string

The maximum number of retries.

Default: "3"

abfs_retry_timeout string

Retry timeout.

abfs_backoff_initial_duration string

Initial backoff duration.

abfs_backoff_max_duration string

Maximum backoff duration.

abfs_backoff_base string

The base of the exponential to use

abfs_proxy_url string

Proxy URL to use when connecting

abfs_proxy_ca_certificate string

CA certificate for the proxy.

abfs_proxy_excludes string

Set list of hosts to exclude from proxy connections

abfs_msi_endpoint string

Sets the endpoint for acquiring managed identity tokens.

abfs_federated_token_file string

Sets a file path for acquiring Azure federated identity token in Kubernetes

abfs_use_cli string

Set if the Azure CLI should be used for acquiring access tokens.

Values: "true" "false"

abfs_skip_signature string

Skip fetching credentials and skip signing requests. Used for interacting with public containers.

Values: "true" "false"

abfs_disable_tagging string

Ignore any tags provided to put_opts

Values: "true" "false"

client_timeout string

The timeout setting for Azure client.

abfs_versioning string

Enables Azure blob versioning support when set to 'enabled'. Defaults to 'disabled'.

Default: "disabled"

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

AbfssDataConnectorParams object

abfs_account string

Azure Storage account name.

abfs_container_name string

Azure Storage container name.

abfs_access_key string

Azure Storage account access key.

abfs_bearer_token string

Bearer token to use in Azure requests.

abfs_client_id string

Azure client ID.

abfs_client_secret string

Azure client secret.

abfs_tenant_id string

Azure tenant ID.

abfs_sas_string string

Azure SAS string.

abfs_endpoint string

Azure Storage endpoint.

abfs_use_emulator string

Use the Azure Storage emulator.

Default: "false"

Values: "true" "false"

abfs_use_fabric_endpoint string

Use the Azure Storage fabric endpoint.

Default: "false"

Values: "true" "false"

allow_http string

Allow insecure HTTP connections.

Default: "false"

Values: "true" "false"

abfs_authority_host string

Sets an alternative authority host.

abfs_max_retries string

The maximum number of retries.

Default: "3"

abfs_retry_timeout string

Retry timeout.

abfs_backoff_initial_duration string

Initial backoff duration.

abfs_backoff_max_duration string

Maximum backoff duration.

abfs_backoff_base string

The base of the exponential to use

abfs_proxy_url string

Proxy URL to use when connecting

abfs_proxy_ca_certificate string

CA certificate for the proxy.

abfs_proxy_excludes string

Set list of hosts to exclude from proxy connections

abfs_msi_endpoint string

Sets the endpoint for acquiring managed identity tokens.

abfs_federated_token_file string

Sets a file path for acquiring Azure federated identity token in Kubernetes

abfs_use_cli string

Set if the Azure CLI should be used for acquiring access tokens.

Values: "true" "false"

abfs_skip_signature string

Skip fetching credentials and skip signing requests. Used for interacting with public containers.

Values: "true" "false"

abfs_disable_tagging string

Ignore any tags provided to put_opts

Values: "true" "false"

client_timeout string

The timeout setting for Azure client.

abfs_versioning string

Enables Azure blob versioning support when set to 'enabled'. Defaults to 'disabled'.

Default: "disabled"

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

DebeziumDataConnectorParams object

debezium_transport string required

The message broker transport to use. The default is kafka.

Default: "kafka"

debezium_message_format string required

The message format to use. The default is json.

Default: "json"

kafka_bootstrap_servers string required

A list of host/port pairs for establishing the initial Kafka cluster connection.

kafka_security_protocol string

Security protocol for Kafka connections. Default: 'sasl_ssl'. Options: 'plaintext', 'ssl', 'sasl_plaintext', 'sasl_ssl'.

Default: "sasl_ssl"

kafka_sasl_mechanism string

SASL authentication mechanism. Default: 'SCRAM-SHA-512'. Options: 'PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'.

Default: "SCRAM-SHA-512"

kafka_sasl_username string

SASL username.

kafka_sasl_password string

SASL password.

kafka_ssl_ca_location string

Path to the SSL/TLS CA certificate file for server verification.

kafka_enable_ssl_certificate_verification string

Enable SSL/TLS certificate verification. Default: 'true'.

Default: "true"

kafka_ssl_endpoint_identification_algorithm string

SSL/TLS endpoint identification algorithm. Default: 'https'. Options: 'none', 'https'.

Default: "https"

kafka_consumer_group_id string

Kafka consumer group id to use for this dataset. If not set, a unique id will be generated.

batch_max_size string

Maximum number of change events to batch together before processing

Default: "10000"

batch_max_duration string

Maximum time to wait for a batch to fill before processing

Default: "1s"

DynamodbDataConnectorParams object

dynamodb_aws_region string required

The AWS region to use for DynamoDB.

dynamodb_aws_access_key_id string

The AWS access key ID to use for DynamoDB.

dynamodb_aws_secret_access_key string

The AWS secret access key to use for DynamoDB.

dynamodb_aws_session_token string

The AWS session token to use for DynamoDB.

dynamodb_aws_auth string

Authentication method. Use 'iam_role' for IAM role-based authentication or 'key' for explicit access key credentials

Default: "iam_role"

dynamodb_aws_iam_role_source string

IAM role credential source (only used when aws_auth is 'iam_role'). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables

Values: "auto" "metadata" "env"

unnest_depth string

Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields.

schema_infer_max_records string

Number of documents to use to infer the schema. Defaults to 10.

Default: "10"

scan_segments string

Number of segments. 'auto' by default.

Default: "auto"

scan_interval string

Interval in milliseconds between polling for new records in a DynamoDB stream.

Default: "0s"

time_format string

Go-style time format used for parsing/formatting timestamps

Default: "2006-01-02T15:04:05.000Z07:00"

ready_lag string

When using Streams, once tables reaches this lag, it will be reported as Ready

Default: "2s"

endpoint_url string

Custom endpoint URL for DynamoDB-compatible services (e.g., DynamoDB Local, ScyllaDB Alternator).

lag_exceeds_shard_retention_behavior string

Behavior when stream lag exceeds shard retention (24h). 'error' marks dataset as Error, 'ready_before_load' marks Ready then re-bootstraps, 'ready_after_load' re-bootstraps then marks Ready

Default: "error"

FileDataConnectorParams object

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

GcsDataConnectorParams object

gcs_service_account_path string

Path to a GCS service account JSON key file.

gcs_service_account_key string

GCS service account JSON key as a string.

gcs_application_default_credentials string

Use Google Application Default Credentials for authentication. If GOOGLE_APPLICATION_CREDENTIALS env var is set, uses that path.

Default: "false"

Values: "true" "false"

allow_http string

Allow insecure HTTP connections.

Default: "false"

Values: "true" "false"

gcs_max_retries string

The maximum number of retries.

Default: "3"

gcs_retry_timeout string

Retry timeout.

gcs_backoff_initial_duration string

Initial backoff duration.

gcs_backoff_max_duration string

Maximum backoff duration.

gcs_backoff_base string

The base of the exponential to use

gcs_skip_signature string

Skip signing requests. Used for public buckets.

Values: "true" "false"

client_timeout string

The timeout setting for GCS client.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

GitDataConnectorParams object

include string

Include only files matching the glob pattern. Multiple patterns can be separated by comma or semicolon.

Examples: "*.rs", "**/*.yaml;src/**/*.json"

fetch_content string

Whether to fetch file content. Set to 'true' to include file content in the 'content' column.

Default: "false"

Values: "true" "false"

cache_path string

Custom path for the local Git repository cache. If not specified, uses system temp directory.

max_files string

Maximum number of files to materialize from a Git repository. Default: 5000. Hard limit: 50000.

Default: "5000"

max_file_bytes string

Maximum size (bytes) for an individual file when fetching content. Files larger than this value are skipped. Default: 524288. Maximum: 5242880 (5 MiB).

git_username string

Username for HTTP(S) basic authentication.

git_password string

Password or personal access token for HTTP(S) basic authentication.

git_token string

Personal access token used for HTTP(S) authentication. Equivalent to providing a username of 'x-access-token' with the token as the password.

git_ssh_key string

Absolute path to an SSH private key used to authenticate to the remote repository.

git_ssh_passphrase string

Passphrase for the SSH private key identified by 'ssh_key'.

git_ssh_use_agent string

When 'true', attempt to authenticate via the running ssh-agent when no explicit ssh_key is provided. Defaults to 'true'.

Default: "true"

Values: "true" "false"

enable_lfs string

Whether to fetch git-lfs objects after clone/fetch. Requires the 'git-lfs' CLI to be available on PATH.

Default: "false"

Values: "true" "false"

max_concurrent_requests string

Maximum number of concurrent Git network operations (clone/fetch) across datasets that share the same repository URL.

Default: "4"

git_max_retries string

Maximum number of retries when the connector encounters a transient error cloning or fetching from the remote.

Default: "3"

backoff_method string

Backoff strategy for retries on transient errors.

Default: "exponential"

Values: "exponential" "fibonacci"

disable_on_permanent_error string

When true, a permanent error (authentication failure, access denied) will disable the connector to prevent a thundering herd of failed requests.

Default: "true"

Values: "true" "false"

git_include string

DEPRECATED: Rename 'git_include' to 'include'.

[deprecated] Use unprefixed 'include'.

git_fetch_content string

DEPRECATED: Rename 'git_fetch_content' to 'fetch_content'.

[deprecated] Use unprefixed 'fetch_content'.

git_cache_path string

DEPRECATED: Rename 'git_cache_path' to 'cache_path'.

[deprecated] Use unprefixed 'cache_path'.

git_max_files string

DEPRECATED: Rename 'git_max_files' to 'max_files'.

[deprecated] Use unprefixed 'max_files'.

git_max_file_bytes string

DEPRECATED: Rename 'git_max_file_bytes' to 'max_file_bytes'.

[deprecated] Use unprefixed 'max_file_bytes'.

git_enable_lfs string

DEPRECATED: Rename 'git_enable_lfs' to 'enable_lfs'.

[deprecated] Use unprefixed 'enable_lfs'.

GithubDataConnectorParams object

github_token string

A Github token.

github_client_id string

The Github App Client ID.

github_private_key string

The Github App private key.

github_installation_id string

The Github App installation ID.

github_query_mode string

Specify what search mode (REST, GraphQL, Search API) to use when retrieving results.

Default: "auto"

github_endpoint string

The Github API endpoint.

Default: "https://api.github.com"

github_include_comments string

Specifies the types of comments to fetch: 'all', 'review', 'discussion', or 'none'.

Default: "none"

github_max_comments_fetched string

Maximum number of comments to fetch per discussion or review thread.

Default: "100"

github_include_commits string

Whether to fetch commit information (created_at, updated_at) for files. Set to 'true' to enable.

Default: "false"

github_workflow_logs string

Whether to download and include workflow run logs. Set to 'enabled' to download logs for each workflow run. Defaults to 'disabled'.

Default: "disabled"

include string

Include only files matching the pattern.

Examples: "*.json", "**/*.yaml;src/**/*.json"

GlueDataConnectorParams object

glue_catalog_id string

glue_region string

glue_endpoint string

glue_url_style string

Controls S3 URL addressing style. Supported values: 'vhost' and 'path'. When not set, auto-detected from the endpoint.

Values: "vhost" "path"

glue_key string

glue_secret string

glue_session_token string

glue_auth string

Configures the authentication method for S3. Supported methods are: public (i.e. no auth), iam_role, key.

glue_iam_role_source string

IAM role credential source (used when auth is 'iam_role' or unset, i.e. default IAM-based auth). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"

glue_versioning string

Enables S3 object versioning support when set to 'enabled'. Defaults to 'enabled'.

Default: "enabled"

client_timeout string

The timeout setting for S3 client.

allow_http string

Allow HTTP protocol for S3 endpoint.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

GsDataConnectorParams object

gcs_service_account_path string

Path to a GCS service account JSON key file.

gcs_service_account_key string

GCS service account JSON key as a string.

gcs_application_default_credentials string

Use Google Application Default Credentials for authentication. If GOOGLE_APPLICATION_CREDENTIALS env var is set, uses that path.

Default: "false"

Values: "true" "false"

allow_http string

Allow insecure HTTP connections.

Default: "false"

Values: "true" "false"

gcs_max_retries string

The maximum number of retries.

Default: "3"

gcs_retry_timeout string

Retry timeout.

gcs_backoff_initial_duration string

Initial backoff duration.

gcs_backoff_max_duration string

Maximum backoff duration.

gcs_backoff_base string

The base of the exponential to use

gcs_skip_signature string

Skip signing requests. Used for public buckets.

Values: "true" "false"

client_timeout string

The timeout setting for GCS client.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

HttpDataConnectorParams object

http_username string

http_password string

http_port string

The port to connect to.

client_timeout string

The timeout setting for HTTP(S) client requests (in seconds). Default: 30

connect_timeout string

The timeout for establishing HTTP(S) connections (in seconds). Default: 10

pool_max_idle_per_host string

Maximum number of idle connections to keep alive per host. Default: 10

pool_idle_timeout string

Timeout for idle connections in the pool (in seconds). Default: 90

http_headers string

Custom HTTP headers to include in requests. Format: 'Header1: Value1, Header2: Value2'. Headers are applied to all requests.

max_retries string

Maximum number of retries for HTTP requests. Default: 3

retry_backoff_method string

Retry backoff method: 'fibonacci' (default), 'linear', or 'exponential'.

retry_max_duration string

Maximum total duration for all retries (e.g., '30s', '5m'). If not set, retries will continue up to max_retries.

retry_jitter string

Randomization factor for retry delays (0.0 to 1.0). Default: 0.3 (30% randomization). Set to 0 for no jitter.

allowed_request_paths string

Comma-separated list of request_path values that users are allowed to query. Required to enable request_path filters.

request_query_filters string

Set to 'enabled' or 'disabled' to control whether request_query filters can be pushed down to HTTP requests.

Values: "enabled" "disabled"

max_request_query_length string

Maximum length (in characters) for request_query filter values. Default: 1024.

request_body_filters string

Set to 'enabled' or 'disabled' to control whether request_body filters can be pushed down as HTTP request bodies.

Values: "enabled" "disabled"

max_request_body_bytes string

Maximum size (in bytes) for request_body filter values. Default: 16384 (16KiB).

health_probe string

Custom health probe path for endpoint validation (e.g., '/health', '/api/status'). The endpoint must return a 2xx status code to pass validation. If not set, a random path is used and any status (including 404) is accepted.

pagination string

Pagination mode. 'auto' (default): auto-detects Link headers. 'enabled': explicitly enable with config. 'disabled': no pagination.

Values: "auto" "enabled" "disabled"

pagination_next_pointer string

JSON pointer (RFC 6901) to the next page URL or cursor in the response body (e.g., '/next', '/pagination/cursor', '/links/next').

pagination_link_header string

Whether to follow HTTP Link headers with rel="next" for pagination. Default: 'enabled' (auto-detected). Set to 'disabled' to ignore Link headers.

Values: "enabled" "disabled"

pagination_token_param string

When set, the value from 'pagination_next_pointer' is treated as a cursor/token and passed as this query parameter name in subsequent requests. When not set, the value is treated as a full URL.

pagination_data_pointer string

JSON pointer (RFC 6901) to the data array in each page's response (e.g., '/data', '/results', '/items'). When set, only the array at this path is returned as data rows.

pagination_max_pages string

Maximum number of pages to fetch for pagination. Default: 100.

pagination_data_map_to_array string

When 'enabled', if the data at pagination_data_pointer (or the top-level response) is a JSON object/map, extract its values as rows instead of treating it as a single row. Default: 'disabled'.

Values: "enabled" "disabled"

pagination_query_params string

Query parameter template for client-driven pagination. Supports {offset}, {limit}, and {page} variables. Example: 'offset={offset}&limit={limit}'. Requires pagination_page_size.

pagination_page_size string

Number of items per page for query-parameter pagination. Must be a positive integer greater than 0. Used to expand {limit} in pagination_query_params and to detect the last page (fewer results than page_size = done).

auth_token_url string

OAuth2 token endpoint URL. When set together with http_auth_refresh_token, the connector exchanges the refresh token for short-lived access tokens (RFC 6749 §6) and attaches 'Authorization: Bearer ' to all data requests. Applies to JSON API endpoints only.

http_auth_refresh_token string

OAuth2 refresh token exchanged against auth_token_url to obtain access tokens. Required when auth_token_url is set.

http_auth_client_id string

OAuth2 client_id presented to the token endpoint. Required for confidential clients; optional for public clients. Paired with http_auth_client_secret.

http_auth_client_secret string

OAuth2 client_secret presented to the token endpoint. Required when the client is confidential; must be set together with http_auth_client_id.

auth_scopes string

Space-separated OAuth2 scopes to request when refreshing. Omit to inherit the scopes bound to the refresh token. Optional.

auth_client_auth string

How client credentials are sent to the token endpoint: 'basic' (HTTP Basic header, default per RFC 6749 §2.3.1) or 'body' (client_id/client_secret in the form body). Case-insensitive.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

HttpsDataConnectorParams object

http_username string

http_password string

http_port string

The port to connect to.

client_timeout string

The timeout setting for HTTP(S) client requests (in seconds). Default: 30

connect_timeout string

The timeout for establishing HTTP(S) connections (in seconds). Default: 10

pool_max_idle_per_host string

Maximum number of idle connections to keep alive per host. Default: 10

pool_idle_timeout string

Timeout for idle connections in the pool (in seconds). Default: 90

http_headers string

Custom HTTP headers to include in requests. Format: 'Header1: Value1, Header2: Value2'. Headers are applied to all requests.

max_retries string

Maximum number of retries for HTTP requests. Default: 3

retry_backoff_method string

Retry backoff method: 'fibonacci' (default), 'linear', or 'exponential'.

retry_max_duration string

Maximum total duration for all retries (e.g., '30s', '5m'). If not set, retries will continue up to max_retries.

retry_jitter string

Randomization factor for retry delays (0.0 to 1.0). Default: 0.3 (30% randomization). Set to 0 for no jitter.

allowed_request_paths string

Comma-separated list of request_path values that users are allowed to query. Required to enable request_path filters.

request_query_filters string

Set to 'enabled' or 'disabled' to control whether request_query filters can be pushed down to HTTP requests.

Values: "enabled" "disabled"

max_request_query_length string

Maximum length (in characters) for request_query filter values. Default: 1024.

request_body_filters string

Set to 'enabled' or 'disabled' to control whether request_body filters can be pushed down as HTTP request bodies.

Values: "enabled" "disabled"

max_request_body_bytes string

Maximum size (in bytes) for request_body filter values. Default: 16384 (16KiB).

health_probe string

Custom health probe path for endpoint validation (e.g., '/health', '/api/status'). The endpoint must return a 2xx status code to pass validation. If not set, a random path is used and any status (including 404) is accepted.

pagination string

Pagination mode. 'auto' (default): auto-detects Link headers. 'enabled': explicitly enable with config. 'disabled': no pagination.

Values: "auto" "enabled" "disabled"

pagination_next_pointer string

JSON pointer (RFC 6901) to the next page URL or cursor in the response body (e.g., '/next', '/pagination/cursor', '/links/next').

pagination_link_header string

Whether to follow HTTP Link headers with rel="next" for pagination. Default: 'enabled' (auto-detected). Set to 'disabled' to ignore Link headers.

Values: "enabled" "disabled"

pagination_token_param string

When set, the value from 'pagination_next_pointer' is treated as a cursor/token and passed as this query parameter name in subsequent requests. When not set, the value is treated as a full URL.

pagination_data_pointer string

JSON pointer (RFC 6901) to the data array in each page's response (e.g., '/data', '/results', '/items'). When set, only the array at this path is returned as data rows.

pagination_max_pages string

Maximum number of pages to fetch for pagination. Default: 100.

pagination_data_map_to_array string

When 'enabled', if the data at pagination_data_pointer (or the top-level response) is a JSON object/map, extract its values as rows instead of treating it as a single row. Default: 'disabled'.

Values: "enabled" "disabled"

pagination_query_params string

Query parameter template for client-driven pagination. Supports {offset}, {limit}, and {page} variables. Example: 'offset={offset}&limit={limit}'. Requires pagination_page_size.

pagination_page_size string

Number of items per page for query-parameter pagination. Must be a positive integer greater than 0. Used to expand {limit} in pagination_query_params and to detect the last page (fewer results than page_size = done).

auth_token_url string

OAuth2 token endpoint URL. When set together with http_auth_refresh_token, the connector exchanges the refresh token for short-lived access tokens (RFC 6749 §6) and attaches 'Authorization: Bearer ' to all data requests. Applies to JSON API endpoints only.

http_auth_refresh_token string

OAuth2 refresh token exchanged against auth_token_url to obtain access tokens. Required when auth_token_url is set.

http_auth_client_id string

OAuth2 client_id presented to the token endpoint. Required for confidential clients; optional for public clients. Paired with http_auth_client_secret.

http_auth_client_secret string

OAuth2 client_secret presented to the token endpoint. Required when the client is confidential; must be set together with http_auth_client_id.

auth_scopes string

Space-separated OAuth2 scopes to request when refreshing. Omit to inherit the scopes bound to the refresh token. Optional.

auth_client_auth string

How client credentials are sent to the token endpoint: 'basic' (HTTP Basic header, default per RFC 6749 §2.3.1) or 'body' (client_id/client_secret in the form body). Case-insensitive.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

IcebergDataConnectorParams object

metadata_path string

The path including scheme to the metadata file for the Hadoop table. Must specify a path to a .json file. For example, s3a://my-bucket/warehouse/namespace/table/metadata/v1.metadata.json

iceberg_token string

Bearer token value to use for Authorization header.

iceberg_oauth2_credential string

Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret>.

iceberg_oauth2_token_url string

The URL to use for OAuth2 token endpoint.

iceberg_oauth2_scope string

The scope to use for OAuth2 token endpoint (default: catalog).

Default: "catalog"

iceberg_oauth2_server_url string

URL of the OAuth2 server tokens endpoint.

iceberg_sigv4_enabled string

Enable SigV4 authentication for the catalog (for connecting to AWS Glue).

iceberg_signing_region string

The region to use when signing the request for SigV4. Defaults to the region in the catalog URL if available.

iceberg_signing_name string

The name to use when signing the request for SigV4.

Default: "glue"

iceberg_warehouse string

Name of the Iceberg warehouse.

iceberg_s3_endpoint string

Configure an alternative endpoint for the S3 service. This can be any s3-compatible object storage service. i.e. Minio, Cloudflare R2, etc.

iceberg_s3_access_key_id string

The AWS access key ID to use for S3 storage.

iceberg_s3_secret_access_key string

The AWS secret access key to use for S3 storage.

iceberg_s3_session_token string

Configure the static session token used for S3 storage.

iceberg_s3_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"

iceberg_s3_region string

The AWS S3 region to use.

iceberg_s3_role_session_name string

An optional identifier for the assumed role session for auditing purposes.

iceberg_s3_role_arn string

The Amazon Resource Name (ARN) of the role to assume. If provided instead of s3_access_key_id and s3_secret_access_key, temporary credentials will be fetched by assuming this role

iceberg_s3_connect_timeout string

Configure socket connection timeout, in seconds (default: 60).

iceberg_gcs_project_id string

The Google Cloud project ID for GCS storage.

iceberg_gcs_credentials string

Base64-encoded Google Cloud service account credentials JSON for GCS storage.

iceberg_gcs_token string

OAuth2 token to use for GCS authentication.

iceberg_gcs_service_path string

Custom endpoint URL for GCS (for emulators or custom endpoints).

iceberg_gcs_no_auth string

Set to 'true' to allow anonymous access to GCS (for public buckets).

KafkaDataConnectorParams object

kafka_bootstrap_servers string required

A list of host/port pairs for establishing the initial Kafka cluster connection.

kafka_security_protocol string

Security protocol for Kafka connections. Default: 'sasl_ssl'. Options: 'plaintext', 'ssl', 'sasl_plaintext', 'sasl_ssl'.

Default: "sasl_ssl"

kafka_sasl_mechanism string

SASL authentication mechanism. Default: 'SCRAM-SHA-512'. Options: 'PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'.

Default: "SCRAM-SHA-512"

kafka_sasl_username string

SASL username.

kafka_sasl_password string

SASL password.

kafka_ssl_ca_location string

Path to the SSL/TLS CA certificate file for server verification.

kafka_enable_ssl_certificate_verification string

Enable SSL/TLS certificate verification. Default: 'true'.

Default: "true"

Values: "true" "false"

kafka_ssl_endpoint_identification_algorithm string

SSL/TLS endpoint identification algorithm. Default: 'https'. Options: 'none', 'https'.

Default: "https"

Values: "none" "https"

schema_infer_max_records string

Number of Kafka messages to sample for schema inference. Default: '1'. Increase if your data has optional fields or varying structure.

Default: "1"

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

kafka_consumer_group_id string

Kafka consumer group id to use for this dataset. If not set, a unique id will be generated.

batch_max_size string

Maximum number of change events to batch together before processing

Default: "10000"

batch_max_duration string

Maximum time to wait for a batch to fill before processing

Default: "1s"

LocalpodDataConnectorParams object

MemoryDataConnectorParams object

S3DataConnectorParams object

s3_region string

s3_endpoint string

s3_url_style string

Controls S3 URL addressing style. Supported values: 'vhost' and 'path'. When not set, auto-detected from the endpoint.

Values: "vhost" "path"

s3_key string

s3_secret string

s3_session_token string

s3_auth string

Configures the authentication method for S3. Supported methods are: public (i.e. no auth), iam_role, key.

s3_iam_role_source string

IAM role credential source (used when auth is 'iam_role' or unset, i.e. default IAM-based auth). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"

s3_versioning string

Enables S3 object versioning support when set to 'enabled'. Defaults to 'enabled'.

Default: "enabled"

client_timeout string

The timeout setting for S3 client.

allow_http string

Allow HTTP protocol for S3 endpoint.

file_format string

file_extension string

schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"

csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"

hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"

schema_source_path string

Specify a path to use for schema inference.

json_format string

Default: "auto"

Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"

json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"

soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"

Values: "enabled" "disabled"

refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"

Values: "enabled" "disabled"

SinkDataConnectorParams object

SpiceAiDataConnectorParams object

spiceai_api_key string

spiceai_token string

spiceai_endpoint string

ArrowAcceleratorParams object

file_watcher string

hash_index string

Enable hash index for fast primary key lookups. Set to 'enabled' to enable (requires primary_key). Default: disabled.

arrow_sort_columns string

Comma-separated list of columns to sort data by during inserts (e.g., 'timestamp,user_id').

CayenneAcceleratorParams object

cayenne_s3_region string

AWS region for S3 Express One Zone storage. If not specified, derived from cayenne_s3_zone_ids.

cayenne_s3_endpoint string

Custom S3 endpoint URL for S3 Express One Zone.

cayenne_s3_key string

AWS access key ID for S3 authentication.

cayenne_s3_secret string

AWS secret access key for S3 authentication.

cayenne_s3_session_token string

AWS session token for temporary credentials (optional).

cayenne_s3_auth string

Authentication method for S3 Express One Zone. Options: 'iam_role' (default, uses environment credentials), 'key' (uses explicit cayenne_s3_key/cayenne_s3_secret).

Default: "iam_role"

Values: "iam_role" "key"

cayenne_s3_client_timeout string

Timeout for S3 client operations (e.g., '30s', '5m'). Default: 120s.

Default: "120s"

cayenne_s3_allow_http string

Allow HTTP (non-TLS) connections to S3. Default: false.

Default: "false"

cayenne_s3_unsigned_payload string

Use unsigned payload for S3 Express One Zone requests. Only applies when S3 Express mode is enabled (via cayenne_s3_zone_ids or directory bucket path). Skips SHA-256 computation for request body, improving upload performance. S3 Express One Zone uses session-based auth, making payload signing unnecessary. Default: true.

Default: "true"

cayenne_s3_zone_ids string

Comma-separated list of Availability Zone IDs for S3 Express One Zone storage (e.g., 'usw2-az1' or 'usw2-az1,usw2-az2'). When specified without 'cayenne_file_path', auto-generates bucket name from app and dataset name, and creates the bucket if needed. For multi-zone redundancy, specify multiple zones. Data is written to all zones with ACID guarantees - writes succeed only if all zones succeed. Reads are served from the primary (first) zone with fallback to replicas.

cayenne_file_path string

Path for storing Cayenne data files (Vortex files). Can be a local path or an S3 Express One Zone path. For S3 Express One Zone, use format: 's3://{bucket-name}--{zone-id}--x-s3/{prefix}/'. When S3 Express One Zone is specified, data files are stored exclusively in S3 while metadata (SQLite) remains on local disk.

cayenne_metadata_dir string

Path for storing Cayenne metadata (SQLite catalog). If not specified, defaults to '{cayenne_file_path}/metadata'.

cayenne_metastore string

Metastore backend for Cayenne catalog. Options: 'sqlite' (default), 'turso' (requires 'turso' feature enabled at build time)

Default: "sqlite"

Values: "sqlite" "turso"

file_watcher string

cayenne_unsupported_type_action string

How to handle data types not natively supported by Cayenne (internally using Vortex format) (Time32, Time64, Duration, Interval, etc.). Options: 'string' (convert schema to Utf8, default - requires data source to provide string data), 'error' (fail on unsupported types), 'warn' (include in schema, may fail on insert), 'ignore' (skip unsupported fields)

Default: "string"

Values: "string" "error" "ignore" "warn"

cayenne_footer_cache_mb string

Size of the in-memory Vortex footer cache in MB. Larger values improve query performance for repeated scans. Default: 128 MB

Default: "128"

cayenne_segment_cache_mb string

Size of the in-memory Vortex segment cache in MB. Set > 0 to cache decompressed data segments. Default: 256 MB

Default: "256"

cayenne_target_file_size_mb string

Target size for Vortex data files in MB. Default: 256 MB. Adjust as needed for S3 Express or remote upload scenarios.

Default: "256"

cayenne_sort_columns string

Comma-separated list of columns to sort data by during inserts (e.g., 'timestamp,user_id').

cayenne_compression_strategy string

Compression strategy to use for Vortex files. Options: 'btrblocks' (default), 'zstd'

Default: "btrblocks"

Values: "btrblocks" "zstd"

cayenne_upload_concurrency string

Maximum number of concurrent file uploads when writing multiple Vortex files. Default: 4.

Default: "4"

DuckdbAcceleratorParams object

file_watcher string

duckdb_file string

duckdb_data_dir string

duckdb_memory_limit string

duckdb_preserve_insertion_order string

duckdb_index_scan_percentage string

duckdb_index_scan_max_count string

partition_mode string

duckdb_partitioned_write_flush_threshold_rows string

connection_pool_size string

The maximum number of client connections created in the duckdb connection pool.

on_refresh_recompute_statistics string

on_refresh_sort_columns string

partitioned_write_buffer string

optimizer_duckdb_aggregate_pushdown string

PostgresAcceleratorParams object

pg_host string

pg_port string

pg_db string

pg_user string

pg_pass string

pg_sslmode string

pg_sslrootcert string

pg_connection_pool_min string

The minimum number of connections to keep open in the pool, lazily created when requested.

Default: "5"

file_watcher string

connection_pool_size string

The maximum number of connections created in the connection pool.

Default: "10"

SqliteAcceleratorParams object

sqlite_file string

busy_timeout string

file_watcher string

TursoAcceleratorParams object

turso_turso_file string

Path to the Turso database file. If not specified, defaults to {spice_data_dir}/{dataset_name}.turso

turso_internal_timestamp_format string

Internal timestamp storage format: 'rfc3339' (default, preserves precision/timezone) or 'integer_millis' (performance, millisecond precision only)

Default: "rfc3339"

Values: "rfc3339" "integer_millis"

IcebergCatalogParams object

iceberg_token string

Bearer token value to use for Authorization header.

iceberg_oauth2_credential string

Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret>.

iceberg_oauth2_token_url string

The URL to use for OAuth2 token endpoint.

iceberg_oauth2_scope string

The scope to use for OAuth2 token endpoint (default: catalog).

Default: "catalog"

iceberg_oauth2_server_url string

URL of the OAuth2 server tokens endpoint.

iceberg_sigv4_enabled string

Enable SigV4 authentication for the catalog (for connecting to AWS Glue).

iceberg_signing_region string

The region to use when signing the request for SigV4. Defaults to the region in the catalog URL if available.

iceberg_signing_name string

The name to use when signing the request for SigV4.

Default: "glue"

iceberg_warehouse string

Name of the Iceberg warehouse.

iceberg_s3_endpoint string

Configure an alternative endpoint for the S3 service. This can be any s3-compatible object storage service. i.e. Minio, Cloudflare R2, etc.

iceberg_s3_access_key_id string

The AWS access key ID to use for S3 storage.

iceberg_s3_secret_access_key string

The AWS secret access key to use for S3 storage.

iceberg_s3_session_token string

Configure the static session token used for S3 storage.

iceberg_s3_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"

iceberg_s3_region string

The AWS S3 region to use.

iceberg_s3_role_session_name string

An optional identifier for the assumed role session for auditing purposes.

iceberg_s3_role_arn string

The Amazon Resource Name (ARN) of the role to assume. If provided instead of s3_access_key_id and s3_secret_access_key, temporary credentials will be fetched by assuming this role

iceberg_s3_connect_timeout string

Configure socket connection timeout, in seconds (default: 60).

iceberg_gcs_project_id string

The Google Cloud project ID for GCS storage.

iceberg_gcs_credentials string

Base64-encoded Google Cloud service account credentials JSON for GCS storage.

iceberg_gcs_token string

OAuth2 token to use for GCS authentication.

iceberg_gcs_service_path string

Custom endpoint URL for GCS (for emulators or custom endpoints).

iceberg_gcs_no_auth string

Set to 'true' to allow anonymous access to GCS (for public buckets).

SpiceAiCatalogParams object

spiceai_api_key string

spiceai_token string

spiceai_endpoint string

spiceai_flight_endpoint string

spiceai_http_endpoint string

UnityCatalogCatalogParams object

unity_catalog_token string

The personal access token used to authenticate against the Unity Catalog API.

unity_catalog_aws_region string

The AWS region to use for S3 storage.

unity_catalog_aws_access_key_id string

The AWS access key ID to use for S3 storage.

unity_catalog_aws_secret_access_key string

The AWS secret access key to use for S3 storage.

unity_catalog_aws_endpoint string

The AWS endpoint to use for S3 storage.

unity_catalog_azure_storage_account_name string

The storage account to use for Azure storage.

unity_catalog_azure_storage_account_key string

The storage account key to use for Azure storage.

unity_catalog_azure_storage_client_id string

The service principal client id for accessing the storage account.

unity_catalog_azure_storage_client_secret string

The service principal client secret for accessing the storage account.

unity_catalog_azure_storage_sas_key string

The shared access signature key for accessing the storage account.

unity_catalog_azure_storage_endpoint string

The endpoint for the Azure Blob storage account.

unity_catalog_google_service_account string

Filesystem path to the Google service account JSON key file.

DatabricksCatalogParams object

databricks_endpoint string required

The endpoint of the Databricks instance.

databricks_token string

The personal access token used to authenticate against the DataBricks API.

mode string

The execution mode for querying against Databricks.

Default: "spark_connect"

client_timeout string

The timeout setting for object store client.

databricks_cluster_id string

The ID of the compute cluster in Databricks to use for the query. Only valid when mode is spark_connect.

databricks_use_ssl string

Use a TLS connection to connect to the Databricks Spark Connect endpoint.

Default: "true"

databricks_sql_warehouse_id string

The SQL Warehouse ID to use when 'mode' is set to 'sql_warehouse'

max_concurrent_requests string

Maximum number of concurrent HTTP requests to the SQL Warehouse API.

Default: "8"

http_max_retries string

Maximum number of HTTP-level retries for transient failures (429, 5xx).

Default: "3"

backoff_method string

Backoff strategy for transient HTTP retries.

Default: "fibonacci"

Values: "fibonacci" "exponential"

statement_max_retries string

Maximum number of poll retries when waiting for async statement completion.

Default: "14"

disable_on_permanent_error string

When true, non-retryable errors (401, 403, 404) permanently disable the connector to prevent a thundering herd of failed requests.

Default: "true"

databricks_client_id string

The client ID of the Databricks service principal.

databricks_client_secret string

The client secret of the Databricks service principal.

databricks_aws_region string

The AWS region to use for S3 storage.

databricks_aws_access_key_id string

The AWS access key ID to use for S3 storage.

databricks_aws_secret_access_key string

The AWS secret access key to use for S3 storage.

databricks_aws_endpoint string

The AWS endpoint to use for S3 storage.

databricks_azure_storage_account_name string

The storage account to use for Azure storage.

databricks_azure_storage_account_key string

The storage account key to use for Azure storage.

databricks_azure_storage_client_id string

The service principal client id for accessing the storage account.

databricks_azure_storage_client_secret string

The service principal client secret for accessing the storage account.

databricks_azure_storage_sas_key string

The shared access signature key for accessing the storage account.

databricks_azure_storage_endpoint string

The endpoint for the Azure Blob storage account.

databricks_google_service_account string

Filesystem path to the Google service account JSON key file.

OpenaiModelParams object

endpoint string

The OpenAI API base endpoint. Can be overridden to use a compatible provider (i.e. Nvidia NIM).

Default: "https://api.openai.com/v1"

openai_api_key string

The OpenAI API key.

openai_org_id string

The OpenAI organization ID.

openai_project_id string

The OpenAI project ID.

openai_usage_tier string

The current usage tier for the OpenAI account associated with the API key: 'free', 'tier1', 'tier2', 'tier3', 'tier4', or 'tier5'.

Default: "tier1"

Values: "free" "tier1" "tier2" "tier3" "tier4" "tier5"

responses_api string

Whether to enable use of this model via the Responses API. disabled by default.

Default: "disabled"

openai_responses_tools string

The OpenAI Responses tools to use when calling the model from the Responses API

Default: ""

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

frequency_penalty string

logit_bias string

logprobs string

top_logprobs string

max_completion_tokens string

reasoning_effort string

store string

metadata string

n string

presence_penalty string

response_format string

seed string

stop string

stream string

stream_options string

temperature string

top_p string

tool_choice string

parallel_tool_calls string

user string

openai_frequency_penalty string

DEPRECATED: Use 'frequency_penalty' without prefix

openai_logit_bias string

DEPRECATED: Use 'logit_bias' without prefix

openai_logprobs string

DEPRECATED: Use 'logprobs' without prefix

openai_top_logprobs string

DEPRECATED: Use 'top_logprobs' without prefix

openai_max_completion_tokens string

DEPRECATED: Use 'max_completion_tokens' without prefix

openai_reasoning_effort string

DEPRECATED: Use 'reasoning_effort' without prefix

openai_store string

DEPRECATED: Use 'store' without prefix

openai_metadata string

DEPRECATED: Use 'metadata' without prefix

openai_n string

DEPRECATED: Use 'n' without prefix

openai_presence_penalty string

DEPRECATED: Use 'presence_penalty' without prefix

openai_response_format string

DEPRECATED: Use 'response_format' without prefix

openai_seed string

DEPRECATED: Use 'seed' without prefix

openai_stop string

DEPRECATED: Use 'stop' without prefix

openai_stream string

DEPRECATED: Use 'stream' without prefix

openai_stream_options string

DEPRECATED: Use 'stream_options' without prefix

openai_temperature string

DEPRECATED: Use 'temperature' without prefix

openai_top_p string

DEPRECATED: Use 'top_p' without prefix

openai_tools string

DEPRECATED: Use 'tools' without prefix

openai_tool_choice string

DEPRECATED: Use 'tool_choice' without prefix

openai_parallel_tool_calls string

DEPRECATED: Use 'parallel_tool_calls' without prefix

openai_user string

DEPRECATED: Use 'user' without prefix

AzureModelParams object

endpoint string

The Azure OpenAI resource endpoint, e.g., https://resource-name.openai.azure.com.

azure_api_version string

The API version used for the Azure OpenAI service.

azure_deployment_name string

The name of the model deployment.

azure_api_key string

The Azure OpenAI API key from the models deployment page.

azure_entra_token string

The Azure Entra token for authentication.

azure_openai_responses_tools string

Comma-separated list of OpenAI-hosted tools exposed via the Responses API for this model.

Default: ""

responses_api string

Whether to enable use of this model via the Responses API. disabled by default.

Default: "disabled"

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

azure_frequency_penalty string

azure_logit_bias string

azure_logprobs string

azure_top_logprobs string

azure_max_completion_tokens string

azure_reasoning_effort string

azure_store string

azure_metadata string

azure_n string

azure_presence_penalty string

azure_response_format string

azure_seed string

azure_stop string

azure_stream string

azure_stream_options string

azure_temperature string

azure_top_p string

azure_tools string

azure_tool_choice string

azure_parallel_tool_calls string

azure_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

FileModelParams object

chat_template string

Customizes the transformation of OpenAI chat messages into a character stream for the model.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

file_frequency_penalty string

file_logit_bias string

file_logprobs string

file_top_logprobs string

file_max_completion_tokens string

file_reasoning_effort string

file_store string

file_metadata string

file_n string

file_presence_penalty string

file_response_format string

file_seed string

file_stop string

file_stream string

file_stream_options string

file_temperature string

file_top_p string

file_tools string

file_tool_choice string

file_parallel_tool_calls string

file_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

DatabricksModelParams object

databricks_endpoint string

The Databricks workspace endpoint, e.g., dbc-a12cd3e4-56f7.cloud.databricks.com.

databricks_token string

The Databricks API token to authenticate with the Databricks Models API.

databricks_client_id string

The Databricks Service Principal Client ID. Can't be used with databricks_token.

databricks_client_secret string

The Databricks Service Principal Client Secret. Can't be used with databricks_token.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

databricks_frequency_penalty string

databricks_logit_bias string

databricks_logprobs string

databricks_top_logprobs string

databricks_max_completion_tokens string

databricks_reasoning_effort string

databricks_store string

databricks_metadata string

databricks_n string

databricks_presence_penalty string

databricks_response_format string

databricks_seed string

databricks_stop string

databricks_stream string

databricks_stream_options string

databricks_temperature string

databricks_top_p string

databricks_tools string

databricks_tool_choice string

databricks_parallel_tool_calls string

databricks_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

HuggingfaceModelParams object

model_type string

The architecture to load the model as. Supported values: mistral, gemma, mixtral, llama, phi2, phi3, qwen2, gemma2, starcoder2, phi3.5moe, deepseekv2, deepseekv3

chat_template string

Customizes the transformation of OpenAI chat messages into a character stream for the model.

huggingface_token string

The Huggingface access token.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

huggingface_frequency_penalty string

huggingface_logit_bias string

huggingface_logprobs string

huggingface_top_logprobs string

huggingface_max_completion_tokens string

huggingface_reasoning_effort string

huggingface_store string

huggingface_metadata string

huggingface_n string

huggingface_presence_penalty string

huggingface_response_format string

huggingface_seed string

huggingface_stop string

huggingface_stream string

huggingface_stream_options string

huggingface_temperature string

huggingface_top_p string

huggingface_tools string

huggingface_tool_choice string

huggingface_parallel_tool_calls string

huggingface_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

AnthropicModelParams object

endpoint string

The Anthropic API base endpoint.

anthropic_api_key string

The Anthropic API key.

anthropic_auth_token string

The Anthropic Auth Token.

anthropic_usage_tier string

Anthropic usage tier (1-4). Used for rate limit defaults.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

anthropic_frequency_penalty string

anthropic_logit_bias string

anthropic_logprobs string

anthropic_top_logprobs string

anthropic_max_completion_tokens string

anthropic_reasoning_effort string

anthropic_store string

anthropic_metadata string

anthropic_n string

anthropic_presence_penalty string

anthropic_response_format string

anthropic_seed string

anthropic_stop string

anthropic_stream string

anthropic_stream_options string

anthropic_temperature string

anthropic_top_p string

anthropic_tools string

anthropic_tool_choice string

anthropic_parallel_tool_calls string

anthropic_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

XaiModelParams object

xai_api_key string

The xAI API key.

xai_usage_tier string

xAI usage tier (0-4). Used for rate limit defaults.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

xai_frequency_penalty string

xai_logit_bias string

xai_logprobs string

xai_top_logprobs string

xai_max_completion_tokens string

xai_reasoning_effort string

xai_store string

xai_metadata string

xai_n string

xai_presence_penalty string

xai_response_format string

xai_seed string

xai_stop string

xai_stream string

xai_stream_options string

xai_temperature string

xai_top_p string

xai_tools string

xai_tool_choice string

xai_parallel_tool_calls string

xai_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

BedrockModelParams object

aws_access_key_id string

The AWS access key ID to use for Bedrock models

aws_secret_access_key string

The AWS secret access key to use for Bedrock models

aws_session_token string

The AWS session token to use for Bedrock models.

aws_region string

The AWS region to use for Bedrock models.

aws_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"

bedrock_guardrail_identifier string

Identifier for the guardrail. Pattern: (([a-z0-9]+) | (arn:aws(-[^:]+)?:bedrock:[a-z0-9-]{1,20}:[0-9]{12}:guardrail/[a-z0-9]+)). Length: 0-2048.

bedrock_guardrail_version string

Guardrail version. Pattern: (([1-9][0-9]{0,7})|(DRAFT))

bedrock_trace string

Trace behavior for the guardrail. Valid values: enabled, disabled, enabled_full

Values: "enabled" "disabled" "enabled_full"

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

bedrock_frequency_penalty string

bedrock_logit_bias string

bedrock_logprobs string

bedrock_top_logprobs string

bedrock_max_completion_tokens string

bedrock_reasoning_effort string

bedrock_store string

bedrock_metadata string

bedrock_n string

bedrock_presence_penalty string

bedrock_response_format string

bedrock_seed string

bedrock_stop string

bedrock_stream string

bedrock_stream_options string

bedrock_temperature string

bedrock_top_p string

bedrock_tools string

bedrock_tool_choice string

bedrock_parallel_tool_calls string

bedrock_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

GoogleModelParams object

google_api_key string

The Google Generative AI API key.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string

max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

google_frequency_penalty string

google_logit_bias string

google_logprobs string

google_top_logprobs string

google_max_completion_tokens string

google_reasoning_effort string

google_store string

google_metadata string

google_n string

google_presence_penalty string

google_response_format string

google_seed string

google_stop string

google_stream string

google_stream_options string

google_temperature string

google_top_p string

google_tools string

google_tool_choice string

google_parallel_tool_calls string

google_user string

openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

AbfsDataset object

from string required

Data source path for abfs connector. Format: abfs:

pattern=^abfs:

name string required

params AbfsDataConnectorParams | null

Connection parameters for the abfs data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

AbfssDataset object

from string required

Data source path for abfss connector. Format: abfss:

pattern=^abfss:

name string required

params AbfssDataConnectorParams | null

Connection parameters for the abfss data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

DebeziumDataset object

from string required

Data source path for debezium connector. Format: debezium:

pattern=^debezium:

name string required

params DebeziumDataConnectorParams | null

Connection parameters for the debezium data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

DynamodbDataset object

from string required

Data source path for dynamodb connector. Format: dynamodb:

pattern=^dynamodb:

name string required

params DynamodbDataConnectorParams | null

Connection parameters for the dynamodb data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

FileDataset object

from string required

Data source path for file connector. Format: file:

pattern=^file:

name string required

params FileDataConnectorParams | null

Connection parameters for the file data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GcsDataset object

from string required

Data source path for gcs connector. Format: gcs:

pattern=^gcs:

name string required

params GcsDataConnectorParams | null

Connection parameters for the gcs data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GitDataset object

from string required

Data source path for git connector. Format: git:

pattern=^git:

name string required

params GitDataConnectorParams | null

Connection parameters for the git data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GithubDataset object

from string required

Data source path for github connector. Format: github:

pattern=^github:

name string required

params GithubDataConnectorParams | null

Connection parameters for the github data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GlueDataset object

from string required

Data source path for glue connector. Format: glue:

pattern=^glue:

name string required

params GlueDataConnectorParams | null

Connection parameters for the glue data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GsDataset object

from string required

Data source path for gs connector. Format: gs:

pattern=^gs:

name string required

params GsDataConnectorParams | null

Connection parameters for the gs data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

HttpDataset object

from string required

Data source path for http connector. Format: http:

pattern=^http:

name string required

params HttpDataConnectorParams | null

Connection parameters for the http data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

HttpsDataset object

from string required

Data source path for https connector. Format: https:

pattern=^https:

name string required

params HttpsDataConnectorParams | null

Connection parameters for the https data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

IcebergDataset object

from string required

Data source path for iceberg connector. Format: iceberg:

pattern=^iceberg:

name string required

params IcebergDataConnectorParams | null

Connection parameters for the iceberg data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

KafkaDataset object

from string required

Data source path for kafka connector. Format: kafka:

pattern=^kafka:

name string required

params KafkaDataConnectorParams | null

Connection parameters for the kafka data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

LocalpodDataset object

from string required

Data source path for localpod connector. Format: localpod:

pattern=^localpod:

name string required

params LocalpodDataConnectorParams | null

Connection parameters for the localpod data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

MemoryDataset object

from string required

Data source path for memory connector. Format: memory:

pattern=^memory:

name string required

params MemoryDataConnectorParams | null

Connection parameters for the memory data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

S3Dataset object

from string required

Data source path for s3 connector. Format: s3:

pattern=^s3:

name string required

params S3DataConnectorParams | null

Connection parameters for the s3 data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

SinkDataset object

from string required

Data source path for sink connector. Format: sink:

pattern=^sink:

name string required

params SinkDataConnectorParams | null

Connection parameters for the sink data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

SpiceAiDataset object

from string required

Data source path for spice.ai connector. Format: spice.ai:

pattern=^spice\.ai:

name string required

params SpiceAiDataConnectorParams | null

Connection parameters for the spice.ai data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

ArrowAcceleratedDataset object

Dataset with arrow acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "arrow"

params ArrowAcceleratorParams | null

Configuration parameters for the arrow acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

CayenneAcceleratedDataset object

Dataset with cayenne acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "cayenne"

params CayenneAcceleratorParams | null

Configuration parameters for the cayenne acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

DuckdbAcceleratedDataset object

Dataset with duckdb acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "duckdb"

params DuckdbAcceleratorParams | null

Configuration parameters for the duckdb acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

PostgresAcceleratedDataset object

Dataset with postgres acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "postgres"

params PostgresAcceleratorParams | null

Configuration parameters for the postgres acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

SqliteAcceleratedDataset object

Dataset with sqlite acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "sqlite"

params SqliteAcceleratorParams | null

Configuration parameters for the sqlite acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

TursoAcceleratedDataset object

Dataset with turso acceleration engine.

from string required

name string required

acceleration object

2 nested properties

engine string

Constant: "turso"

params TursoAcceleratorParams | null

Configuration parameters for the turso acceleration engine.

description string | null

metadata object

columns Column[]

access string | string

params Params | null

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

IcebergCatalog object

from string required

Catalog source for iceberg connector. Format: iceberg:<catalog_path>

pattern=^iceberg:

name string required

params IcebergCatalogParams | null

Connection parameters for the iceberg catalog connector.

description string | null

metadata object

access string | string

include string[]

dataset_params Params | null

dependsOn string[]

metrics Metrics | null

mode string | string

SpiceAiCatalog object

from string required

Catalog source for spice.ai connector. Format: spice.ai:<catalog_path>

pattern=^spice\.ai:

name string required

params SpiceAiCatalogParams | null

Connection parameters for the spice.ai catalog connector.

description string | null

metadata object

access string | string

include string[]

dataset_params Params | null

dependsOn string[]

metrics Metrics | null

mode string | string

UnityCatalogCatalog object

from string required

Catalog source for unity_catalog connector. Format: unity_catalog:<catalog_path>

pattern=^unity_catalog:

name string required

params UnityCatalogCatalogParams | null

Connection parameters for the unity_catalog catalog connector.

description string | null

metadata object

access string | string

include string[]

dataset_params Params | null

dependsOn string[]

metrics Metrics | null

mode string | string

DatabricksCatalog object

from string required

Catalog source for databricks connector. Format: databricks:<catalog_path>

pattern=^databricks:

name string required

params DatabricksCatalogParams | null

Connection parameters for the databricks catalog connector.

description string | null

metadata object

access string | string

include string[]

dataset_params Params | null

dependsOn string[]

metrics Metrics | null

mode string | string

OpenaiModel object

from string required

Model source for openai provider. Format: openai:<model_id>

pattern=^openai:

name string required

params OpenaiModelParams | null

Configuration parameters for the openai model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

AzureModel object

from string required

Model source for azure provider. Format: azure:<model_id>

pattern=^azure:

name string required

params AzureModelParams | null

Configuration parameters for the azure model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

FileModel object

from string required

Model source for file provider. Format: file:<model_id>

pattern=^file:

name string required

params FileModelParams | null

Configuration parameters for the file model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

DatabricksModel object

from string required

Model source for databricks provider. Format: databricks:<model_id>

pattern=^databricks:

name string required

params DatabricksModelParams | null

Configuration parameters for the databricks model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

HuggingfaceModel object

from string required

Model source for huggingface provider. Format: huggingface:<model_id>

pattern=^huggingface:

name string required

params HuggingfaceModelParams | null

Configuration parameters for the huggingface model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

AnthropicModel object

from string required

Model source for anthropic provider. Format: anthropic:<model_id>

pattern=^anthropic:

name string required

params AnthropicModelParams | null

Configuration parameters for the anthropic model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

XaiModel object

from string required

Model source for xai provider. Format: xai:<model_id>

pattern=^xai:

name string required

params XaiModelParams | null

Configuration parameters for the xai model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

BedrockModel object

from string required

Model source for bedrock provider. Format: bedrock:<model_id>

pattern=^bedrock:

name string required

params BedrockModelParams | null

Configuration parameters for the bedrock model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

GoogleModel object

from string required

Model source for google provider. Format: google:<model_id>

pattern=^google:

name string required

params GoogleModelParams | null

Configuration parameters for the google model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

GenericDataset object

Generic dataset for custom or unknown connectors.

from string required

name string required

params Params | null

Connection parameters for the data connector.

description string | null

metadata object

columns Column[]

access string | string

has_metadata_table boolean | null

replication Replication | null

time_column string | null

time_format TimeFormat | null

time_partition_column string | null

time_partition_format TimeFormat | null

acceleration Acceleration | null

embeddings ColumnEmbeddingConfig[]

dependsOn string[]

invalid_type_action InvalidTypeAction | null

unsupported_type_action UnsupportedTypeAction | null

ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null

vectors VectorStore | null

check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string

GenericCatalog object

Generic catalog for custom or unknown connectors.

from string required

name string required

params Params | null

Connection parameters for the catalog connector.

description string | null

metadata object

access string | string

include string[]

dataset_params Params | null

dependsOn string[]

metrics Metrics | null

mode string | string

GenericModel object

Generic model for custom or unknown model sources.

from string required

name string required

params Params | null

Configuration parameters for the model provider.

description string | null

metadata object

files ModelFile[]

datasets string[]

dependsOn string[]

metrics Metrics | null

spicepod.yaml

Validate with Lintel

Properties

Definitions

Examples