Type object
File match spicepod.yml spicepod.yaml
Schema URL https://catalog.lintel.tools/schemas/schemastore/spicepod-yaml/latest.json
Source https://raw.githubusercontent.com/spiceai/spiceai/trunk/.schema/spicepod.schema.json

Validate with Lintel

npx @lintel/lintel check
Type: object

A Spicepod definition is a YAML file that describes a Spicepod.

Properties

name string required

The name of the Spicepod

version string required
Values: "v1" "v2"
kind string required
Values: "Spicepod"
runtime object

Helper struct for deserializing Runtime with custom logic for handling memory_limit/temp_directory deprecation

19 nested properties
results_cache ResultsCache | null
Default: null
caching Caching | null
Default: null
dataset_load_parallelism integer | null
format=uintmin=0
tls TlsConfig | null

If set, the runtime will configure all endpoints to use TLS

tracing TracingConfig | null
telemetry object
5 nested properties
enabled boolean
Default: true
user_agent_collection string
Values: "full" "disabled"
properties Record<string, string>

Custom key/value attributes attached to telemetry metrics emitted by spiced. Applied as OpenTelemetry resource attributes on the runtime's MeterProvider, so they appear as dimensions on every metric exported via the Prometheus scrape endpoint, the cluster on-demand OTLP reader, and the otel_exporter push exporter, and as labels on anonymous usage telemetry. Currently does not affect tracing spans or logs. Example: { environment: prod, region: us-west-2, team: data-platform }.

Default:
{}
metric_prefix string | null

Optional prefix prepended to every exported metric name.

Useful for namespacing Spice metrics in shared backends (e.g. Datadog, Grafana Cloud) so they don't collide with metrics from other services. For example, with metric_prefix: "spiceai." the runtime metric query_duration_ms is exported as spiceai.query_duration_ms.

The prefix is applied via an OpenTelemetry View on the runtime's MeterProvider, so it affects every metric reader attached to that provider — the Prometheus scrape endpoint (--metrics), the cluster on-demand OTLP reader, and the otel_exporter push reader. OpenTelemetry 0.31's SDK does not support per-reader name transforms, so this knob is intentionally placed at the telemetry level rather than under any single exporter.

otel_exporter OtelExporterConfig | null

Optional configuration for pushing metrics to an OpenTelemetry collector

params Record<string, string>
task_history object
8 nested properties
enabled boolean
Default: true
captured_output string
Default: "none"
captured_context string
Default: "truncated"
Values: "redacted" "truncated" "full"
retention_period string
Default: "8h"
retention_check_interval string
Default: "15m"
min_sql_duration string | null
captured_plan string | null
min_plan_duration string | null
auth Auth | null
cors object
2 nested properties
enabled boolean
Default: false
allowed_origins string[]
Default:
[
  "*"
]
flight Flight | null
temp_directory string | null

Configures where the runtime will store temporary files needed for operations like spilling to disk for queries & accelerations that are larger than memory.

memory_limit string | null

Specifies the runtime memory limit. When configured, will spill to disk for supported queries larger than memory.

shutdown_timeout string | null

Configures how long the runtime waits for connections to be gracefully drained and components to shut down cleanly during runtime termination

ready_state string | string

Controls when the runtime readiness probe reports the runtime as ready.

output_level OutputLevel | null

Configures log level for the runtime. Can be overriden if flags or environment variables are set.

query Query | null
metrics Metrics | null
scheduler Scheduler | null
management Management | null

Optional management configuration

snapshots Snapshots | null

Optional acceleration snapshot configuration

extensions Record<string, object>

Optional extensions configuration

secrets Secret[]

Optional spicepod secrets configuration Default value is:

secrets:
  - from: env
    name: env
metadata object
dependencies string[]

Definitions

SpicepodVersion string
SpicepodKind string
Runtime object

Helper struct for deserializing Runtime with custom logic for handling memory_limit/temp_directory deprecation

results_cache ResultsCache | null
Default: null
caching Caching | null
Default: null
dataset_load_parallelism integer | null
format=uintmin=0
tls TlsConfig | null

If set, the runtime will configure all endpoints to use TLS

tracing TracingConfig | null
telemetry object
5 nested properties
enabled boolean
Default: true
user_agent_collection string
Values: "full" "disabled"
properties Record<string, string>

Custom key/value attributes attached to telemetry metrics emitted by spiced. Applied as OpenTelemetry resource attributes on the runtime's MeterProvider, so they appear as dimensions on every metric exported via the Prometheus scrape endpoint, the cluster on-demand OTLP reader, and the otel_exporter push exporter, and as labels on anonymous usage telemetry. Currently does not affect tracing spans or logs. Example: { environment: prod, region: us-west-2, team: data-platform }.

Default:
{}
metric_prefix string | null

Optional prefix prepended to every exported metric name.

Useful for namespacing Spice metrics in shared backends (e.g. Datadog, Grafana Cloud) so they don't collide with metrics from other services. For example, with metric_prefix: "spiceai." the runtime metric query_duration_ms is exported as spiceai.query_duration_ms.

The prefix is applied via an OpenTelemetry View on the runtime's MeterProvider, so it affects every metric reader attached to that provider — the Prometheus scrape endpoint (--metrics), the cluster on-demand OTLP reader, and the otel_exporter push reader. OpenTelemetry 0.31's SDK does not support per-reader name transforms, so this knob is intentionally placed at the telemetry level rather than under any single exporter.

otel_exporter OtelExporterConfig | null

Optional configuration for pushing metrics to an OpenTelemetry collector

params Record<string, string>
task_history object
8 nested properties
enabled boolean
Default: true
captured_output string
Default: "none"
captured_context string
Default: "truncated"
Values: "redacted" "truncated" "full"
retention_period string
Default: "8h"
retention_check_interval string
Default: "15m"
min_sql_duration string | null
captured_plan string | null
min_plan_duration string | null
auth Auth | null
cors object
2 nested properties
enabled boolean
Default: false
allowed_origins string[]
Default:
[
  "*"
]
flight Flight | null
temp_directory string | null

Configures where the runtime will store temporary files needed for operations like spilling to disk for queries & accelerations that are larger than memory.

memory_limit string | null

Specifies the runtime memory limit. When configured, will spill to disk for supported queries larger than memory.

shutdown_timeout string | null

Configures how long the runtime waits for connections to be gracefully drained and components to shut down cleanly during runtime termination

ready_state string | string

Controls when the runtime readiness probe reports the runtime as ready.

output_level OutputLevel | null

Configures log level for the runtime. Can be overriden if flags or environment variables are set.

query Query | null
metrics Metrics | null
scheduler Scheduler | null
ResultsCache object
enabled boolean
Default: true
cache_max_size string | null
item_ttl string | null
caching_policy string | string
cache_key_type string
Values: "plan" "sql"
hashing_algorithm string
Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"
engine string | string
max_stale_while_revalidate string | null

Maximum stale-while-revalidate duration to add to the cache TTL.

eviction_policy string | string
CachingPolicy string | string
CacheKeyType string
HashingAlgorithm string
CacheEngine string | string
Caching object
sql_results SQLResultsCacheConfig | null
search_results CacheConfig | null
embeddings CacheConfig | null
SQLResultsCacheConfig object
enabled boolean
Default: true
max_size string | null
item_ttl string | null
caching_policy string | string
hashing_algorithm string
Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"
cache_key_type string
Values: "plan" "sql"
engine string | string
stale_while_revalidate_ttl string | null

Maximum age for serving stale cached results while revalidating in the background. When set, cached results past their TTL (but within this additional window) will be served immediately while a background refresh is triggered. Format: duration string (e.g., "30s", "5m"). This is a response directive.

encoding string
Values: "none" "zstd"
eviction_policy string | string
Encoding string
CacheConfig object
enabled boolean
Default: true
max_size string | null
item_ttl string | null
caching_policy string | string
hashing_algorithm string
Values: "siphash" "ahash" "xxh3" "xxh32" "xxh64" "xxh128" "blake3"
engine string | string
eviction_policy string | string
TlsConfig object
enabled boolean required

If set, the runtime will configure all endpoints to use TLS

certificate_file string | null

A filesystem path to a file containing the PEM encoded certificate

certificate string | null

A PEM encoded certificate

key_file string | null

A filesystem path to a file containing the PEM encoded private key

key string | null

A PEM encoded private key

TracingConfig object
zipkin_enabled boolean required
zipkin_endpoint string | null
TelemetryConfig object
enabled boolean
Default: true
user_agent_collection string
Values: "full" "disabled"
properties Record<string, string>

Custom key/value attributes attached to telemetry metrics emitted by spiced. Applied as OpenTelemetry resource attributes on the runtime's MeterProvider, so they appear as dimensions on every metric exported via the Prometheus scrape endpoint, the cluster on-demand OTLP reader, and the otel_exporter push exporter, and as labels on anonymous usage telemetry. Currently does not affect tracing spans or logs. Example: { environment: prod, region: us-west-2, team: data-platform }.

Default:
{}
metric_prefix string | null

Optional prefix prepended to every exported metric name.

Useful for namespacing Spice metrics in shared backends (e.g. Datadog, Grafana Cloud) so they don't collide with metrics from other services. For example, with metric_prefix: "spiceai." the runtime metric query_duration_ms is exported as spiceai.query_duration_ms.

The prefix is applied via an OpenTelemetry View on the runtime's MeterProvider, so it affects every metric reader attached to that provider — the Prometheus scrape endpoint (--metrics), the cluster on-demand OTLP reader, and the otel_exporter push reader. OpenTelemetry 0.31's SDK does not support per-reader name transforms, so this knob is intentionally placed at the telemetry level rather than under any single exporter.

otel_exporter OtelExporterConfig | null

Optional configuration for pushing metrics to an OpenTelemetry collector

UserAgentCollection string
OtelExporterConfig object

Configuration for pushing metrics to an OpenTelemetry collector.

The protocol is inferred from the endpoint:

  • HTTP: When endpoint has <http://> or <https://> scheme, or contains /v1/metrics
  • gRPC: When endpoint is just a hostname and optional port (defaults to 4317)

Examples

gRPC (hostname only, port defaults to 4317):

otel_exporter:
  enabled: true
  endpoint: "otel-collector"

With metric whitelist:

otel_exporter:
  enabled: true
  endpoint: "otel-collector:4317"
  metrics:
    - requests_total
    - request_duration_seconds

HTTP:

otel_exporter:
  enabled: true
  endpoint: "<http://localhost:4318/v1/metrics>"
endpoint string required

The endpoint of the OTEL collector.

For gRPC: use hostname with optional port (e.g., otel-collector or localhost:4317) For HTTP: use full URL (e.g., <http://localhost:4318/v1/metrics>)

enabled boolean

Whether the OTEL exporter is enabled

Default: true
push_interval string

How often to push metrics to the collector (e.g., "30s", "1m", "5m")

Default: "60s"
metrics string[]

Optional whitelist of metric names to export. If not specified or empty, all metrics are exported.

headers Record<string, string>

Optional headers to send with each export request. For HTTP: sent as HTTP headers. For gRPC: sent as metadata entries (keys must be lowercase ASCII, e.g. use authorization not Authorization). Values support secret replacement syntax (e.g., ${secrets:api_key}).

temporality string | string | string

Aggregation temporality preference for the OTEL metrics push exporter.

Controls how counter and histogram values are encoded on the wire:

  • Delta (default): each export contains the change since the previous export. Required by Datadog's OTLP intake and recommended by AWS CloudWatch, New Relic, and most push-based SaaS backends. Aligns with the OpenTelemetry guidance for push exporters.
  • Cumulative: each export carries the running total since process start. Use this for OTel collectors that downstream into Prometheus or other pull-based / cumulative-native backends.
  • LowMemory: counters use cumulative, histograms use delta. Reduces the SDK's in-process state for histogram-heavy workloads.

This setting only affects the OTLP push exporter; the runtime's Prometheus scrape endpoint always exposes cumulative metrics regardless of this value.

OtelTemporality string | string | string

Aggregation temporality preference for the OTEL metrics push exporter.

Controls how counter and histogram values are encoded on the wire:

  • Delta (default): each export contains the change since the previous export. Required by Datadog's OTLP intake and recommended by AWS CloudWatch, New Relic, and most push-based SaaS backends. Aligns with the OpenTelemetry guidance for push exporters.
  • Cumulative: each export carries the running total since process start. Use this for OTel collectors that downstream into Prometheus or other pull-based / cumulative-native backends.
  • LowMemory: counters use cumulative, histograms use delta. Reduces the SDK's in-process state for histogram-heavy workloads.

This setting only affects the OTLP push exporter; the runtime's Prometheus scrape endpoint always exposes cumulative metrics regardless of this value.

TaskHistory object
enabled boolean
Default: true
captured_output string
Default: "none"
captured_context string
Default: "truncated"
Values: "redacted" "truncated" "full"
retention_period string
Default: "8h"
retention_check_interval string
Default: "15m"
min_sql_duration string | null
captured_plan string | null
min_plan_duration string | null
Auth object
api_key ApiKeyAuth | null
ApiKeyAuth object
keys ApiKey[] required
enabled boolean
Default: true
ApiKey object | object

API key for authentication. Keys can be read-only or read-write. The key value is redacted in Debug output to prevent credential leakage.

All comparisons (both ApiKey to ApiKey and ApiKey to &str) use constant-time comparison via the subtle crate to prevent timing attacks.

CorsConfig object
enabled boolean
Default: false
allowed_origins string[]
Default:
[
  "*"
]
Flight object
max_message_size string | null
do_put_rate_limit_enabled boolean

Whether to enable rate limiting on Flight DoPut (write) requests. Defaults to true. Set to false to disable write rate limiting for bulk ingest workloads.

Default: true
RuntimeReadyState string | string

Controls when the runtime readiness probe reports the runtime as ready.

OutputLevel string
Query object
memory_limit string | null

Specifies the runtime memory limit. When configured, will spill to disk for supported queries larger than memory.

temp_directory string | null

Configures where the runtime will store temporary files needed for operations like spilling to disk for queries & accelerations that are larger than memory.

spill_compression SpillCompression | null

Specifies the compression codec used when spilling data to disk.

SpillCompression string
Metrics object
metrics Metric[] required
Metric object
name string required
enabled boolean
Default: true
Scheduler object
state_location string required

Root URI for shared cluster state.

params Params | null

Optional object store params for the shared cluster state.

partition_management PartitionManagement | null

Partition management configuration

Params Record<string, string | integer | number | boolean>
ParamValue string | integer | number | boolean
PartitionManagement object
interval string
Default: "30s"
max_assignments_per_cycle integer
Default: 100
format=uintmin=0
max_partitions_per_executor integer
Default: 1000
format=uintmin=0
discovery_timeout string
Default: "60s"
Management object
api_key string required
enabled boolean
Default: true
params Record<string, string>
Snapshots object

Datasets accelerated using a file-mode acceleration engine (i.e. Sqlite or DuckDB) can bootstrap from a DB file on object storage (i.e. S3) if the acceleration file does not exist on startup using this configuration.

Each dataset needs to opt-in for snapshots in addition to this config.

enabled boolean

Global enable/disable for dataset snapshots.

Default: true
location string | null

The object store location pointing to a folder containing the dataset snapshots. i.e. s3://my-bucket/spice/snapshots/

bootstrap_on_failure_behavior string
Values: "warn" "retry" "fallback"
params Params | null

Auth params for accessing the object store location. For S3, this is the same as the S3 dataset connector params with the notable exception that s3_auth is set to iam_role by default.

BootstrapOnFailureBehavior string
Extension object
enabled boolean
Default: true
params Record<string, string>
Secret object

The secrets configuration for a Spicepod.

Example:

secrets:
  - from: env
    name: env
  - from: kubernetes:my_secret_name
    name: k8s
from string required
name string required
description string | null
params Params | null
ComponentOrReference Catalog | ComponentReference
Catalog

A catalog definition. The params field is validated based on the catalog connector type specified in 'from'.

AccessMode string | string
ComponentReference object
ref string required
dependsOn string[]
ComponentOrReference2 Dataset | ComponentReference
Dataset

A dataset definition. The params field is validated based on the connector type specified in 'from'.

Column object
name string required
description string | null

Optional semantic details about the column

full_text_search FullTextSearchConfig | null
metadata object
ColumnLevelEmbeddingConfig object

Configuration for if and how a dataset's column should be embedded. Different to [crate::component::embeddings::ColumnEmbeddingConfig], as [ColumnLevelEmbeddingConfig] should be a property of [Column], not [super::Dataset].

[crate::component::embeddings::ColumnEmbeddingConfig] will be deprecated long term in favour of [ColumnLevelEmbeddingConfig].

from string
Default: ""
chunking EmbeddingChunkConfig | null
row_id array | null
vector_size integer | null
format=uintmin=0
aggregation EmbeddingAggregation | null

Aggregation strategy for multi-vector embeddings. Only meaningful when the underlying column is list-typed. Defaults to max.

max_elements_per_row integer | null

Maximum number of list elements embedded per row for multi-vector columns. Defaults to 32; hard-capped at 1024.

format=uintmin=0
EmbeddingChunkConfig object
enabled boolean
Default: false
target_chunk_size integer
Default: 0
format=uintmin=0
overlap_size integer
Default: 0
format=uintmin=0
trim_whitespace boolean
Default: false
EmbeddingAggregation string

Aggregation strategy applied when a multi-vector (list-typed) column is queried. Each list element produces its own embedding; at query time the per-element similarities are combined into a single per-row score using this aggregation.

Max is the ColBERT-style MaxSim default — a row scores as high as its best-matching element.

FullTextSearchConfig object
enabled boolean required
row_id array | null
index_store IndexStore | null
index_directory string | null
IndexStore string
Replication object
enabled boolean
Default: false
TimeFormat string
Acceleration object
enabled boolean
Default: true
mode string | string | string | string
refresh_on_startup string | string
engine string | null
refresh_mode RefreshMode | null
refresh_check_interval string | null
refresh_cron string | null
refresh_sql string | null
refresh_data_window string | null
refresh_append_overlap string | null
refresh_retry_enabled boolean
Default: true
refresh_retry_max_attempts integer | null
format=uintmin=0
refresh_jitter_enabled boolean
Default: false
refresh_jitter_max string | null
params Params | null

Configuration parameters for the acceleration engine. The available parameters depend on the engine type specified in 'engine' (default: arrow). Available engines: arrow, cayenne, duckdb, postgres, sqlite, turso.

retention_period string | null
retention_sql string | null
retention_check_interval string | null
retention_check_enabled boolean
on_zero_results string | string

Behavior when a query on an accelerated table returns zero results.

ready_state ReadyState | null
Default: null
indexes Record<string, string>
primary_key string | null
on_conflict Record<string, string>
metrics Metrics | null
partition_by PartitionedBySchema[]

Partition expressions used to physically partition accelerated data.

Each item accepts either:

  • a plain expression string, for example "YEAR(created_at)" or "bucket(100, user_id)"; or
  • a single-entry mapping of a partition name to an expression, for example { year: "YEAR(created_at)" }.
snapshots string | string | string | string
snapshots_trigger SnapshotsTrigger | null
snapshots_trigger_threshold string | null
snapshots_compaction string
Values: "disabled" "enabled"
snapshots_reset_expiry_on_load string
Values: "disabled" "enabled"
snapshots_creation_policy string
Values: "always" "on_change"
Mode string | string | string | string
RefreshOnStartup string | string
RefreshMode string
ZeroResultsAction string | string

Behavior when a query on an accelerated table returns zero results.

ReadyState string | string | string

Controls when the dataset is marked ready for queries.

IndexType string
OnConflictBehavior string
PartitionedBySchema string | object
SnapshotBehavior string | string | string | string
SnapshotsTrigger string | string
SnapshotsCompaction string
SnapshotsResetExpiryOnLoad string
SnapshotsCreationPolicy string
ColumnEmbeddingConfig object

Configuration for if and how a dataset's column should be embedded.

Prefer to use [super::dataset::column::ColumnLevelEmbeddingConfig] going forward. Support for [ColumnEmbeddingConfig] will be removed in future.

column string required
use string
Default: ""
column_pk array | null
chunking EmbeddingChunkConfig | null
vector_size integer | null
format=uintmin=0
aggregation EmbeddingAggregation | null

Aggregation strategy for multi-vector embeddings. Only meaningful when the underlying column is list-typed (List<Utf8> / LargeList<Utf8>). Defaults to max (ColBERT-style MaxSim).

max_elements_per_row integer | null

Maximum number of list elements embedded per row for multi-vector columns. Defaults to 32; hard-capped at 1024. Excess elements are dropped with a warning log.

format=uintmin=0
InvalidTypeAction string

This is deprecated, use unsupported_type_action instead.

UnsupportedTypeAction string
VectorStore object
enabled boolean
Default: true
engine string | null
partition_by PartitionedBySchema[]

Partition expressions used to organize vector data.

Each item accepts either:

  • a plain expression string, for example "YEAR(created_at)" or "bucket(100, user_id)"; or
  • a single-entry mapping of a partition name to an expression, for example { year: "YEAR(created_at)" }.
params Params | null
CheckAvailability string | string

Controls whether the federated table periodically has its availability checked.

ComponentOrReference3 View | ComponentReference
View object
name string required
description string | null
metadata object
columns Column[]
sql string | null

Inline SQL that describes a view.

sql_ref string | null

Reference to a SQL file that describes a view.

acceleration Acceleration | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

vectors VectorStore | null
dependsOn string[]
ComponentOrReference4 Model | ComponentReference
Model

A model definition. The params field is validated based on the model source type specified in 'from'.

ModelFile object
path string required
name string | null
type ModelFileType | null

Should use [Self::file_type] to access.

params Record<string, string>
ModelFileType string
ComponentOrReference5 Embeddings | ComponentReference
Embeddings object
from string required
name string required
files ModelFile[]
params object
datasets string[]
dependsOn string[]
metrics Metrics | null
ComponentOrReference6 Tool | ComponentReference
Tool object
from string required
name string required
description string | null
params Record<string, string>
env Record<string, string>
dependsOn string[]
metrics Metrics | null
ComponentOrReference7 Worker | ComponentReference
Worker object
name string required
description string | null
params object
load_balance LoadBalanceParams | null
sql string | null
cron string | null
LoadBalanceParams object
routing RouterConfig[]
RouterConfig object | object | object
AbfsDataConnectorParams object
abfs_account string

Azure Storage account name.

abfs_container_name string

Azure Storage container name.

abfs_access_key string

Azure Storage account access key.

abfs_bearer_token string

Bearer token to use in Azure requests.

abfs_client_id string

Azure client ID.

abfs_client_secret string

Azure client secret.

abfs_tenant_id string

Azure tenant ID.

abfs_sas_string string

Azure SAS string.

abfs_endpoint string

Azure Storage endpoint.

abfs_use_emulator string

Use the Azure Storage emulator.

Default: "false"
Values: "true" "false"
abfs_use_fabric_endpoint string

Use the Azure Storage fabric endpoint.

Default: "false"
Values: "true" "false"
allow_http string

Allow insecure HTTP connections.

Default: "false"
Values: "true" "false"
abfs_authority_host string

Sets an alternative authority host.

abfs_max_retries string

The maximum number of retries.

Default: "3"
abfs_retry_timeout string

Retry timeout.

abfs_backoff_initial_duration string

Initial backoff duration.

abfs_backoff_max_duration string

Maximum backoff duration.

abfs_backoff_base string

The base of the exponential to use

abfs_proxy_url string

Proxy URL to use when connecting

abfs_proxy_ca_certificate string

CA certificate for the proxy.

abfs_proxy_excludes string

Set list of hosts to exclude from proxy connections

abfs_msi_endpoint string

Sets the endpoint for acquiring managed identity tokens.

abfs_federated_token_file string

Sets a file path for acquiring Azure federated identity token in Kubernetes

abfs_use_cli string

Set if the Azure CLI should be used for acquiring access tokens.

Values: "true" "false"
abfs_skip_signature string

Skip fetching credentials and skip signing requests. Used for interacting with public containers.

Values: "true" "false"
abfs_disable_tagging string

Ignore any tags provided to put_opts

Values: "true" "false"
client_timeout string

The timeout setting for Azure client.

abfs_versioning string

Enables Azure blob versioning support when set to 'enabled'. Defaults to 'disabled'.

Default: "disabled"
file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
AbfssDataConnectorParams object
abfs_account string

Azure Storage account name.

abfs_container_name string

Azure Storage container name.

abfs_access_key string

Azure Storage account access key.

abfs_bearer_token string

Bearer token to use in Azure requests.

abfs_client_id string

Azure client ID.

abfs_client_secret string

Azure client secret.

abfs_tenant_id string

Azure tenant ID.

abfs_sas_string string

Azure SAS string.

abfs_endpoint string

Azure Storage endpoint.

abfs_use_emulator string

Use the Azure Storage emulator.

Default: "false"
Values: "true" "false"
abfs_use_fabric_endpoint string

Use the Azure Storage fabric endpoint.

Default: "false"
Values: "true" "false"
allow_http string

Allow insecure HTTP connections.

Default: "false"
Values: "true" "false"
abfs_authority_host string

Sets an alternative authority host.

abfs_max_retries string

The maximum number of retries.

Default: "3"
abfs_retry_timeout string

Retry timeout.

abfs_backoff_initial_duration string

Initial backoff duration.

abfs_backoff_max_duration string

Maximum backoff duration.

abfs_backoff_base string

The base of the exponential to use

abfs_proxy_url string

Proxy URL to use when connecting

abfs_proxy_ca_certificate string

CA certificate for the proxy.

abfs_proxy_excludes string

Set list of hosts to exclude from proxy connections

abfs_msi_endpoint string

Sets the endpoint for acquiring managed identity tokens.

abfs_federated_token_file string

Sets a file path for acquiring Azure federated identity token in Kubernetes

abfs_use_cli string

Set if the Azure CLI should be used for acquiring access tokens.

Values: "true" "false"
abfs_skip_signature string

Skip fetching credentials and skip signing requests. Used for interacting with public containers.

Values: "true" "false"
abfs_disable_tagging string

Ignore any tags provided to put_opts

Values: "true" "false"
client_timeout string

The timeout setting for Azure client.

abfs_versioning string

Enables Azure blob versioning support when set to 'enabled'. Defaults to 'disabled'.

Default: "disabled"
file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
DebeziumDataConnectorParams object
debezium_transport string required

The message broker transport to use. The default is kafka.

Default: "kafka"
debezium_message_format string required

The message format to use. The default is json.

Default: "json"
kafka_bootstrap_servers string required

A list of host/port pairs for establishing the initial Kafka cluster connection.

kafka_security_protocol string

Security protocol for Kafka connections. Default: 'sasl_ssl'. Options: 'plaintext', 'ssl', 'sasl_plaintext', 'sasl_ssl'.

Default: "sasl_ssl"
kafka_sasl_mechanism string

SASL authentication mechanism. Default: 'SCRAM-SHA-512'. Options: 'PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'.

Default: "SCRAM-SHA-512"
kafka_sasl_username string

SASL username.

kafka_sasl_password string

SASL password.

kafka_ssl_ca_location string

Path to the SSL/TLS CA certificate file for server verification.

kafka_enable_ssl_certificate_verification string

Enable SSL/TLS certificate verification. Default: 'true'.

Default: "true"
kafka_ssl_endpoint_identification_algorithm string

SSL/TLS endpoint identification algorithm. Default: 'https'. Options: 'none', 'https'.

Default: "https"
kafka_consumer_group_id string

Kafka consumer group id to use for this dataset. If not set, a unique id will be generated.

batch_max_size string

Maximum number of change events to batch together before processing

Default: "10000"
batch_max_duration string

Maximum time to wait for a batch to fill before processing

Default: "1s"
DynamodbDataConnectorParams object
dynamodb_aws_region string required

The AWS region to use for DynamoDB.

dynamodb_aws_access_key_id string

The AWS access key ID to use for DynamoDB.

dynamodb_aws_secret_access_key string

The AWS secret access key to use for DynamoDB.

dynamodb_aws_session_token string

The AWS session token to use for DynamoDB.

dynamodb_aws_auth string

Authentication method. Use 'iam_role' for IAM role-based authentication or 'key' for explicit access key credentials

Default: "iam_role"
dynamodb_aws_iam_role_source string

IAM role credential source (only used when aws_auth is 'iam_role'). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables

Values: "auto" "metadata" "env"
unnest_depth string

Maximum nesting depth for unnesting embedded documents into a flattened structure. Higher values expand deeper nested fields.

schema_infer_max_records string

Number of documents to use to infer the schema. Defaults to 10.

Default: "10"
scan_segments string

Number of segments. 'auto' by default.

Default: "auto"
scan_interval string

Interval in milliseconds between polling for new records in a DynamoDB stream.

Default: "0s"
time_format string

Go-style time format used for parsing/formatting timestamps

Default: "2006-01-02T15:04:05.000Z07:00"
ready_lag string

When using Streams, once tables reaches this lag, it will be reported as Ready

Default: "2s"
endpoint_url string

Custom endpoint URL for DynamoDB-compatible services (e.g., DynamoDB Local, ScyllaDB Alternator).

lag_exceeds_shard_retention_behavior string

Behavior when stream lag exceeds shard retention (24h). 'error' marks dataset as Error, 'ready_before_load' marks Ready then re-bootstraps, 'ready_after_load' re-bootstraps then marks Ready

Default: "error"
FileDataConnectorParams object
file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
GcsDataConnectorParams object
gcs_service_account_path string

Path to a GCS service account JSON key file.

gcs_service_account_key string

GCS service account JSON key as a string.

gcs_application_default_credentials string

Use Google Application Default Credentials for authentication. If GOOGLE_APPLICATION_CREDENTIALS env var is set, uses that path.

Default: "false"
Values: "true" "false"
allow_http string

Allow insecure HTTP connections.

Default: "false"
Values: "true" "false"
gcs_max_retries string

The maximum number of retries.

Default: "3"
gcs_retry_timeout string

Retry timeout.

gcs_backoff_initial_duration string

Initial backoff duration.

gcs_backoff_max_duration string

Maximum backoff duration.

gcs_backoff_base string

The base of the exponential to use

gcs_skip_signature string

Skip signing requests. Used for public buckets.

Values: "true" "false"
client_timeout string

The timeout setting for GCS client.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
GitDataConnectorParams object
include string

Include only files matching the glob pattern. Multiple patterns can be separated by comma or semicolon.

Examples: "*.rs", "**/*.yaml;src/**/*.json"
fetch_content string

Whether to fetch file content. Set to 'true' to include file content in the 'content' column.

Default: "false"
Values: "true" "false"
cache_path string

Custom path for the local Git repository cache. If not specified, uses system temp directory.

max_files string

Maximum number of files to materialize from a Git repository. Default: 5000. Hard limit: 50000.

Default: "5000"
max_file_bytes string

Maximum size (bytes) for an individual file when fetching content. Files larger than this value are skipped. Default: 524288. Maximum: 5242880 (5 MiB).

git_username string

Username for HTTP(S) basic authentication.

git_password string

Password or personal access token for HTTP(S) basic authentication.

git_token string

Personal access token used for HTTP(S) authentication. Equivalent to providing a username of 'x-access-token' with the token as the password.

git_ssh_key string

Absolute path to an SSH private key used to authenticate to the remote repository.

git_ssh_passphrase string

Passphrase for the SSH private key identified by 'ssh_key'.

git_ssh_use_agent string

When 'true', attempt to authenticate via the running ssh-agent when no explicit ssh_key is provided. Defaults to 'true'.

Default: "true"
Values: "true" "false"
enable_lfs string

Whether to fetch git-lfs objects after clone/fetch. Requires the 'git-lfs' CLI to be available on PATH.

Default: "false"
Values: "true" "false"
max_concurrent_requests string

Maximum number of concurrent Git network operations (clone/fetch) across datasets that share the same repository URL.

Default: "4"
git_max_retries string

Maximum number of retries when the connector encounters a transient error cloning or fetching from the remote.

Default: "3"
backoff_method string

Backoff strategy for retries on transient errors.

Default: "exponential"
Values: "exponential" "fibonacci"
disable_on_permanent_error string

When true, a permanent error (authentication failure, access denied) will disable the connector to prevent a thundering herd of failed requests.

Default: "true"
Values: "true" "false"
git_include string

DEPRECATED: Rename 'git_include' to 'include'.

[deprecated] Use unprefixed 'include'.

git_fetch_content string

DEPRECATED: Rename 'git_fetch_content' to 'fetch_content'.

[deprecated] Use unprefixed 'fetch_content'.

git_cache_path string

DEPRECATED: Rename 'git_cache_path' to 'cache_path'.

[deprecated] Use unprefixed 'cache_path'.

git_max_files string

DEPRECATED: Rename 'git_max_files' to 'max_files'.

[deprecated] Use unprefixed 'max_files'.

git_max_file_bytes string

DEPRECATED: Rename 'git_max_file_bytes' to 'max_file_bytes'.

[deprecated] Use unprefixed 'max_file_bytes'.

git_enable_lfs string

DEPRECATED: Rename 'git_enable_lfs' to 'enable_lfs'.

[deprecated] Use unprefixed 'enable_lfs'.

GithubDataConnectorParams object
github_token string

A Github token.

github_client_id string

The Github App Client ID.

github_private_key string

The Github App private key.

github_installation_id string

The Github App installation ID.

github_query_mode string

Specify what search mode (REST, GraphQL, Search API) to use when retrieving results.

Default: "auto"
github_endpoint string

The Github API endpoint.

Default: "https://api.github.com"
github_include_comments string

Specifies the types of comments to fetch: 'all', 'review', 'discussion', or 'none'.

Default: "none"
github_max_comments_fetched string

Maximum number of comments to fetch per discussion or review thread.

Default: "100"
github_include_commits string

Whether to fetch commit information (created_at, updated_at) for files. Set to 'true' to enable.

Default: "false"
github_workflow_logs string

Whether to download and include workflow run logs. Set to 'enabled' to download logs for each workflow run. Defaults to 'disabled'.

Default: "disabled"
include string

Include only files matching the pattern.

Examples: "*.json", "**/*.yaml;src/**/*.json"
GlueDataConnectorParams object
glue_catalog_id string
glue_region string
glue_endpoint string
glue_url_style string

Controls S3 URL addressing style. Supported values: 'vhost' and 'path'. When not set, auto-detected from the endpoint.

Values: "vhost" "path"
glue_key string
glue_secret string
glue_session_token string
glue_auth string

Configures the authentication method for S3. Supported methods are: public (i.e. no auth), iam_role, key.

glue_iam_role_source string

IAM role credential source (used when auth is 'iam_role' or unset, i.e. default IAM-based auth). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"
glue_versioning string

Enables S3 object versioning support when set to 'enabled'. Defaults to 'enabled'.

Default: "enabled"
client_timeout string

The timeout setting for S3 client.

allow_http string

Allow HTTP protocol for S3 endpoint.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
GsDataConnectorParams object
gcs_service_account_path string

Path to a GCS service account JSON key file.

gcs_service_account_key string

GCS service account JSON key as a string.

gcs_application_default_credentials string

Use Google Application Default Credentials for authentication. If GOOGLE_APPLICATION_CREDENTIALS env var is set, uses that path.

Default: "false"
Values: "true" "false"
allow_http string

Allow insecure HTTP connections.

Default: "false"
Values: "true" "false"
gcs_max_retries string

The maximum number of retries.

Default: "3"
gcs_retry_timeout string

Retry timeout.

gcs_backoff_initial_duration string

Initial backoff duration.

gcs_backoff_max_duration string

Maximum backoff duration.

gcs_backoff_base string

The base of the exponential to use

gcs_skip_signature string

Skip signing requests. Used for public buckets.

Values: "true" "false"
client_timeout string

The timeout setting for GCS client.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
HttpDataConnectorParams object
http_username string
http_password string
http_port string

The port to connect to.

client_timeout string

The timeout setting for HTTP(S) client requests (in seconds). Default: 30

connect_timeout string

The timeout for establishing HTTP(S) connections (in seconds). Default: 10

pool_max_idle_per_host string

Maximum number of idle connections to keep alive per host. Default: 10

pool_idle_timeout string

Timeout for idle connections in the pool (in seconds). Default: 90

http_headers string

Custom HTTP headers to include in requests. Format: 'Header1: Value1, Header2: Value2'. Headers are applied to all requests.

max_retries string

Maximum number of retries for HTTP requests. Default: 3

retry_backoff_method string

Retry backoff method: 'fibonacci' (default), 'linear', or 'exponential'.

retry_max_duration string

Maximum total duration for all retries (e.g., '30s', '5m'). If not set, retries will continue up to max_retries.

retry_jitter string

Randomization factor for retry delays (0.0 to 1.0). Default: 0.3 (30% randomization). Set to 0 for no jitter.

allowed_request_paths string

Comma-separated list of request_path values that users are allowed to query. Required to enable request_path filters.

request_query_filters string

Set to 'enabled' or 'disabled' to control whether request_query filters can be pushed down to HTTP requests.

Values: "enabled" "disabled"
max_request_query_length string

Maximum length (in characters) for request_query filter values. Default: 1024.

request_body_filters string

Set to 'enabled' or 'disabled' to control whether request_body filters can be pushed down as HTTP request bodies.

Values: "enabled" "disabled"
max_request_body_bytes string

Maximum size (in bytes) for request_body filter values. Default: 16384 (16KiB).

health_probe string

Custom health probe path for endpoint validation (e.g., '/health', '/api/status'). The endpoint must return a 2xx status code to pass validation. If not set, a random path is used and any status (including 404) is accepted.

pagination string

Pagination mode. 'auto' (default): auto-detects Link headers. 'enabled': explicitly enable with config. 'disabled': no pagination.

Values: "auto" "enabled" "disabled"
pagination_next_pointer string

JSON pointer (RFC 6901) to the next page URL or cursor in the response body (e.g., '/next', '/pagination/cursor', '/links/next').

pagination_link_header string

Whether to follow HTTP Link headers with rel="next" for pagination. Default: 'enabled' (auto-detected). Set to 'disabled' to ignore Link headers.

Values: "enabled" "disabled"
pagination_token_param string

When set, the value from 'pagination_next_pointer' is treated as a cursor/token and passed as this query parameter name in subsequent requests. When not set, the value is treated as a full URL.

pagination_data_pointer string

JSON pointer (RFC 6901) to the data array in each page's response (e.g., '/data', '/results', '/items'). When set, only the array at this path is returned as data rows.

pagination_max_pages string

Maximum number of pages to fetch for pagination. Default: 100.

pagination_data_map_to_array string

When 'enabled', if the data at pagination_data_pointer (or the top-level response) is a JSON object/map, extract its values as rows instead of treating it as a single row. Default: 'disabled'.

Values: "enabled" "disabled"
pagination_query_params string

Query parameter template for client-driven pagination. Supports {offset}, {limit}, and {page} variables. Example: 'offset={offset}&limit={limit}'. Requires pagination_page_size.

pagination_page_size string

Number of items per page for query-parameter pagination. Must be a positive integer greater than 0. Used to expand {limit} in pagination_query_params and to detect the last page (fewer results than page_size = done).

auth_token_url string

OAuth2 token endpoint URL. When set together with http_auth_refresh_token, the connector exchanges the refresh token for short-lived access tokens (RFC 6749 §6) and attaches 'Authorization: Bearer ' to all data requests. Applies to JSON API endpoints only.

http_auth_refresh_token string

OAuth2 refresh token exchanged against auth_token_url to obtain access tokens. Required when auth_token_url is set.

http_auth_client_id string

OAuth2 client_id presented to the token endpoint. Required for confidential clients; optional for public clients. Paired with http_auth_client_secret.

http_auth_client_secret string

OAuth2 client_secret presented to the token endpoint. Required when the client is confidential; must be set together with http_auth_client_id.

auth_scopes string

Space-separated OAuth2 scopes to request when refreshing. Omit to inherit the scopes bound to the refresh token. Optional.

auth_client_auth string

How client credentials are sent to the token endpoint: 'basic' (HTTP Basic header, default per RFC 6749 §2.3.1) or 'body' (client_id/client_secret in the form body). Case-insensitive.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
HttpsDataConnectorParams object
http_username string
http_password string
http_port string

The port to connect to.

client_timeout string

The timeout setting for HTTP(S) client requests (in seconds). Default: 30

connect_timeout string

The timeout for establishing HTTP(S) connections (in seconds). Default: 10

pool_max_idle_per_host string

Maximum number of idle connections to keep alive per host. Default: 10

pool_idle_timeout string

Timeout for idle connections in the pool (in seconds). Default: 90

http_headers string

Custom HTTP headers to include in requests. Format: 'Header1: Value1, Header2: Value2'. Headers are applied to all requests.

max_retries string

Maximum number of retries for HTTP requests. Default: 3

retry_backoff_method string

Retry backoff method: 'fibonacci' (default), 'linear', or 'exponential'.

retry_max_duration string

Maximum total duration for all retries (e.g., '30s', '5m'). If not set, retries will continue up to max_retries.

retry_jitter string

Randomization factor for retry delays (0.0 to 1.0). Default: 0.3 (30% randomization). Set to 0 for no jitter.

allowed_request_paths string

Comma-separated list of request_path values that users are allowed to query. Required to enable request_path filters.

request_query_filters string

Set to 'enabled' or 'disabled' to control whether request_query filters can be pushed down to HTTP requests.

Values: "enabled" "disabled"
max_request_query_length string

Maximum length (in characters) for request_query filter values. Default: 1024.

request_body_filters string

Set to 'enabled' or 'disabled' to control whether request_body filters can be pushed down as HTTP request bodies.

Values: "enabled" "disabled"
max_request_body_bytes string

Maximum size (in bytes) for request_body filter values. Default: 16384 (16KiB).

health_probe string

Custom health probe path for endpoint validation (e.g., '/health', '/api/status'). The endpoint must return a 2xx status code to pass validation. If not set, a random path is used and any status (including 404) is accepted.

pagination string

Pagination mode. 'auto' (default): auto-detects Link headers. 'enabled': explicitly enable with config. 'disabled': no pagination.

Values: "auto" "enabled" "disabled"
pagination_next_pointer string

JSON pointer (RFC 6901) to the next page URL or cursor in the response body (e.g., '/next', '/pagination/cursor', '/links/next').

pagination_link_header string

Whether to follow HTTP Link headers with rel="next" for pagination. Default: 'enabled' (auto-detected). Set to 'disabled' to ignore Link headers.

Values: "enabled" "disabled"
pagination_token_param string

When set, the value from 'pagination_next_pointer' is treated as a cursor/token and passed as this query parameter name in subsequent requests. When not set, the value is treated as a full URL.

pagination_data_pointer string

JSON pointer (RFC 6901) to the data array in each page's response (e.g., '/data', '/results', '/items'). When set, only the array at this path is returned as data rows.

pagination_max_pages string

Maximum number of pages to fetch for pagination. Default: 100.

pagination_data_map_to_array string

When 'enabled', if the data at pagination_data_pointer (or the top-level response) is a JSON object/map, extract its values as rows instead of treating it as a single row. Default: 'disabled'.

Values: "enabled" "disabled"
pagination_query_params string

Query parameter template for client-driven pagination. Supports {offset}, {limit}, and {page} variables. Example: 'offset={offset}&limit={limit}'. Requires pagination_page_size.

pagination_page_size string

Number of items per page for query-parameter pagination. Must be a positive integer greater than 0. Used to expand {limit} in pagination_query_params and to detect the last page (fewer results than page_size = done).

auth_token_url string

OAuth2 token endpoint URL. When set together with http_auth_refresh_token, the connector exchanges the refresh token for short-lived access tokens (RFC 6749 §6) and attaches 'Authorization: Bearer ' to all data requests. Applies to JSON API endpoints only.

http_auth_refresh_token string

OAuth2 refresh token exchanged against auth_token_url to obtain access tokens. Required when auth_token_url is set.

http_auth_client_id string

OAuth2 client_id presented to the token endpoint. Required for confidential clients; optional for public clients. Paired with http_auth_client_secret.

http_auth_client_secret string

OAuth2 client_secret presented to the token endpoint. Required when the client is confidential; must be set together with http_auth_client_id.

auth_scopes string

Space-separated OAuth2 scopes to request when refreshing. Omit to inherit the scopes bound to the refresh token. Optional.

auth_client_auth string

How client credentials are sent to the token endpoint: 'basic' (HTTP Basic header, default per RFC 6749 §2.3.1) or 'body' (client_id/client_secret in the form body). Case-insensitive.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
IcebergDataConnectorParams object
metadata_path string

The path including scheme to the metadata file for the Hadoop table. Must specify a path to a .json file. For example, s3a://my-bucket/warehouse/namespace/table/metadata/v1.metadata.json

iceberg_token string

Bearer token value to use for Authorization header.

iceberg_oauth2_credential string

Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret>.

iceberg_oauth2_token_url string

The URL to use for OAuth2 token endpoint.

iceberg_oauth2_scope string

The scope to use for OAuth2 token endpoint (default: catalog).

Default: "catalog"
iceberg_oauth2_server_url string

URL of the OAuth2 server tokens endpoint.

iceberg_sigv4_enabled string

Enable SigV4 authentication for the catalog (for connecting to AWS Glue).

iceberg_signing_region string

The region to use when signing the request for SigV4. Defaults to the region in the catalog URL if available.

iceberg_signing_name string

The name to use when signing the request for SigV4.

Default: "glue"
iceberg_warehouse string

Name of the Iceberg warehouse.

iceberg_s3_endpoint string

Configure an alternative endpoint for the S3 service. This can be any s3-compatible object storage service. i.e. Minio, Cloudflare R2, etc.

iceberg_s3_access_key_id string

The AWS access key ID to use for S3 storage.

iceberg_s3_secret_access_key string

The AWS secret access key to use for S3 storage.

iceberg_s3_session_token string

Configure the static session token used for S3 storage.

iceberg_s3_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"
iceberg_s3_region string

The AWS S3 region to use.

iceberg_s3_role_session_name string

An optional identifier for the assumed role session for auditing purposes.

iceberg_s3_role_arn string

The Amazon Resource Name (ARN) of the role to assume. If provided instead of s3_access_key_id and s3_secret_access_key, temporary credentials will be fetched by assuming this role

iceberg_s3_connect_timeout string

Configure socket connection timeout, in seconds (default: 60).

iceberg_gcs_project_id string

The Google Cloud project ID for GCS storage.

iceberg_gcs_credentials string

Base64-encoded Google Cloud service account credentials JSON for GCS storage.

iceberg_gcs_token string

OAuth2 token to use for GCS authentication.

iceberg_gcs_service_path string

Custom endpoint URL for GCS (for emulators or custom endpoints).

iceberg_gcs_no_auth string

Set to 'true' to allow anonymous access to GCS (for public buckets).

KafkaDataConnectorParams object
kafka_bootstrap_servers string required

A list of host/port pairs for establishing the initial Kafka cluster connection.

kafka_security_protocol string

Security protocol for Kafka connections. Default: 'sasl_ssl'. Options: 'plaintext', 'ssl', 'sasl_plaintext', 'sasl_ssl'.

Default: "sasl_ssl"
kafka_sasl_mechanism string

SASL authentication mechanism. Default: 'SCRAM-SHA-512'. Options: 'PLAIN', 'SCRAM-SHA-256', 'SCRAM-SHA-512'.

Default: "SCRAM-SHA-512"
kafka_sasl_username string

SASL username.

kafka_sasl_password string

SASL password.

kafka_ssl_ca_location string

Path to the SSL/TLS CA certificate file for server verification.

kafka_enable_ssl_certificate_verification string

Enable SSL/TLS certificate verification. Default: 'true'.

Default: "true"
Values: "true" "false"
kafka_ssl_endpoint_identification_algorithm string

SSL/TLS endpoint identification algorithm. Default: 'https'. Options: 'none', 'https'.

Default: "https"
Values: "none" "https"
schema_infer_max_records string

Number of Kafka messages to sample for schema inference. Default: '1'. Increase if your data has optional fields or varying structure.

Default: "1"
flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
kafka_consumer_group_id string

Kafka consumer group id to use for this dataset. If not set, a unique id will be generated.

batch_max_size string

Maximum number of change events to batch together before processing

Default: "10000"
batch_max_duration string

Maximum time to wait for a batch to fill before processing

Default: "1s"
LocalpodDataConnectorParams object
MemoryDataConnectorParams object
S3DataConnectorParams object
s3_region string
s3_endpoint string
s3_url_style string

Controls S3 URL addressing style. Supported values: 'vhost' and 'path'. When not set, auto-detected from the endpoint.

Values: "vhost" "path"
s3_key string
s3_secret string
s3_session_token string
s3_auth string

Configures the authentication method for S3. Supported methods are: public (i.e. no auth), iam_role, key.

s3_iam_role_source string

IAM role credential source (used when auth is 'iam_role' or unset, i.e. default IAM-based auth). 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"
s3_versioning string

Enables S3 object versioning support when set to 'enabled'. Defaults to 'enabled'.

Default: "enabled"
client_timeout string

The timeout setting for S3 client.

allow_http string

Allow HTTP protocol for S3 endpoint.

file_format string
file_extension string
schema_infer_max_records string

Set a limit in terms of records to scan to infer the schema.

tsv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
tsv_quote string

The quote character in a row.

tsv_escape string

The escape character in a row.

tsv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_has_header string

Set true to indicate that the first line is a header.

Values: "true" "false"
csv_quote string

The quote character in a row.

csv_escape string

The escape character in a row.

csv_schema_infer_max_records string

DEPRECATED: use 'schema_infer_max_records' instead

Set a limit in terms of records to scan to infer the schema.

csv_delimiter string

The character separating values within a row.

file_compression_type string

The type of compression used on the file. Supported types are: GZIP, BZIP2, XZ, ZSTD, UNCOMPRESSED

Values: "GZIP" "BZIP2" "XZ" "ZSTD" "UNCOMPRESSED"
hive_partitioning_enabled string

Enable partitioning using hive-style partitioning from the folder structure. Defaults to false.

Values: "true" "false"
schema_source_path string

Specify a path to use for schema inference.

json_format string

json | jsonl | ndjson | ldjson | array | object | soda | socrata | auto. When 'file_format' is explicitly 'json', effective default is 'json' (no SODA auto-detection).

Default: "auto"
Values: "json" "jsonl" "ndjson" "ldjson" "array" "object" "soda" "socrata" "auto"
json_pointer string

An RFC 6901 JSON Pointer to extract data from within a JSON value. E.g. '/data' for {"data": [...]} or '/response/items' for nested objects. A leading '/' is added automatically if missing.

json_path string

Alias for 'json_pointer'. An RFC 6901 JSON Pointer to extract data from within a JSON value.

flatten_json string

Set true to flatten nested structs in JSON as separate columns.

Values: "true" "false"
soda_metadata string

Set to 'enabled' to include Socrata internal metadata columns (sid, id, position, etc.) in the schema for SODA format responses. Defaults to disabled.

Default: "disabled"
Values: "enabled" "disabled"
refresh_skip string

Control skipping refreshes for single-file S3 datasets when cached ETag/Version metadata matches. Set to 'enabled' (default) or 'disabled'.

Default: "enabled"
Values: "enabled" "disabled"
SinkDataConnectorParams object
SpiceAiDataConnectorParams object
spiceai_api_key string
spiceai_token string
spiceai_endpoint string
ArrowAcceleratorParams object
file_watcher string
hash_index string

Enable hash index for fast primary key lookups. Set to 'enabled' to enable (requires primary_key). Default: disabled.

arrow_sort_columns string

Comma-separated list of columns to sort data by during inserts (e.g., 'timestamp,user_id').

CayenneAcceleratorParams object
cayenne_s3_region string

AWS region for S3 Express One Zone storage. If not specified, derived from cayenne_s3_zone_ids.

cayenne_s3_endpoint string

Custom S3 endpoint URL for S3 Express One Zone.

cayenne_s3_key string

AWS access key ID for S3 authentication.

cayenne_s3_secret string

AWS secret access key for S3 authentication.

cayenne_s3_session_token string

AWS session token for temporary credentials (optional).

cayenne_s3_auth string

Authentication method for S3 Express One Zone. Options: 'iam_role' (default, uses environment credentials), 'key' (uses explicit cayenne_s3_key/cayenne_s3_secret).

Default: "iam_role"
Values: "iam_role" "key"
cayenne_s3_client_timeout string

Timeout for S3 client operations (e.g., '30s', '5m'). Default: 120s.

Default: "120s"
cayenne_s3_allow_http string

Allow HTTP (non-TLS) connections to S3. Default: false.

Default: "false"
cayenne_s3_unsigned_payload string

Use unsigned payload for S3 Express One Zone requests. Only applies when S3 Express mode is enabled (via cayenne_s3_zone_ids or directory bucket path). Skips SHA-256 computation for request body, improving upload performance. S3 Express One Zone uses session-based auth, making payload signing unnecessary. Default: true.

Default: "true"
cayenne_s3_zone_ids string

Comma-separated list of Availability Zone IDs for S3 Express One Zone storage (e.g., 'usw2-az1' or 'usw2-az1,usw2-az2'). When specified without 'cayenne_file_path', auto-generates bucket name from app and dataset name, and creates the bucket if needed. For multi-zone redundancy, specify multiple zones. Data is written to all zones with ACID guarantees - writes succeed only if all zones succeed. Reads are served from the primary (first) zone with fallback to replicas.

cayenne_file_path string

Path for storing Cayenne data files (Vortex files). Can be a local path or an S3 Express One Zone path. For S3 Express One Zone, use format: 's3://{bucket-name}--{zone-id}--x-s3/{prefix}/'. When S3 Express One Zone is specified, data files are stored exclusively in S3 while metadata (SQLite) remains on local disk.

cayenne_metadata_dir string

Path for storing Cayenne metadata (SQLite catalog). If not specified, defaults to '{cayenne_file_path}/metadata'.

cayenne_metastore string

Metastore backend for Cayenne catalog. Options: 'sqlite' (default), 'turso' (requires 'turso' feature enabled at build time)

Default: "sqlite"
Values: "sqlite" "turso"
file_watcher string
cayenne_unsupported_type_action string

How to handle data types not natively supported by Cayenne (internally using Vortex format) (Time32, Time64, Duration, Interval, etc.). Options: 'string' (convert schema to Utf8, default - requires data source to provide string data), 'error' (fail on unsupported types), 'warn' (include in schema, may fail on insert), 'ignore' (skip unsupported fields)

Default: "string"
Values: "string" "error" "ignore" "warn"
cayenne_footer_cache_mb string

Size of the in-memory Vortex footer cache in MB. Larger values improve query performance for repeated scans. Default: 128 MB

Default: "128"
cayenne_segment_cache_mb string

Size of the in-memory Vortex segment cache in MB. Set > 0 to cache decompressed data segments. Default: 256 MB

Default: "256"
cayenne_target_file_size_mb string

Target size for Vortex data files in MB. Default: 256 MB. Adjust as needed for S3 Express or remote upload scenarios.

Default: "256"
cayenne_sort_columns string

Comma-separated list of columns to sort data by during inserts (e.g., 'timestamp,user_id').

cayenne_compression_strategy string

Compression strategy to use for Vortex files. Options: 'btrblocks' (default), 'zstd'

Default: "btrblocks"
Values: "btrblocks" "zstd"
cayenne_upload_concurrency string

Maximum number of concurrent file uploads when writing multiple Vortex files. Default: 4.

Default: "4"
DuckdbAcceleratorParams object
file_watcher string
duckdb_file string
duckdb_data_dir string
duckdb_memory_limit string
duckdb_preserve_insertion_order string
duckdb_index_scan_percentage string
duckdb_index_scan_max_count string
partition_mode string
duckdb_partitioned_write_flush_threshold_rows string
connection_pool_size string

The maximum number of client connections created in the duckdb connection pool.

on_refresh_recompute_statistics string
on_refresh_sort_columns string
partitioned_write_buffer string
optimizer_duckdb_aggregate_pushdown string
PostgresAcceleratorParams object
pg_host string
pg_port string
pg_db string
pg_user string
pg_pass string
pg_sslmode string
pg_sslrootcert string
pg_connection_pool_min string

The minimum number of connections to keep open in the pool, lazily created when requested.

Default: "5"
file_watcher string
connection_pool_size string

The maximum number of connections created in the connection pool.

Default: "10"
SqliteAcceleratorParams object
sqlite_file string
busy_timeout string
file_watcher string
TursoAcceleratorParams object
turso_turso_file string

Path to the Turso database file. If not specified, defaults to {spice_data_dir}/{dataset_name}.turso

turso_internal_timestamp_format string

Internal timestamp storage format: 'rfc3339' (default, preserves precision/timezone) or 'integer_millis' (performance, millisecond precision only)

Default: "rfc3339"
Values: "rfc3339" "integer_millis"
IcebergCatalogParams object
iceberg_token string

Bearer token value to use for Authorization header.

iceberg_oauth2_credential string

Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret>.

iceberg_oauth2_token_url string

The URL to use for OAuth2 token endpoint.

iceberg_oauth2_scope string

The scope to use for OAuth2 token endpoint (default: catalog).

Default: "catalog"
iceberg_oauth2_server_url string

URL of the OAuth2 server tokens endpoint.

iceberg_sigv4_enabled string

Enable SigV4 authentication for the catalog (for connecting to AWS Glue).

iceberg_signing_region string

The region to use when signing the request for SigV4. Defaults to the region in the catalog URL if available.

iceberg_signing_name string

The name to use when signing the request for SigV4.

Default: "glue"
iceberg_warehouse string

Name of the Iceberg warehouse.

iceberg_s3_endpoint string

Configure an alternative endpoint for the S3 service. This can be any s3-compatible object storage service. i.e. Minio, Cloudflare R2, etc.

iceberg_s3_access_key_id string

The AWS access key ID to use for S3 storage.

iceberg_s3_secret_access_key string

The AWS secret access key to use for S3 storage.

iceberg_s3_session_token string

Configure the static session token used for S3 storage.

iceberg_s3_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"
iceberg_s3_region string

The AWS S3 region to use.

iceberg_s3_role_session_name string

An optional identifier for the assumed role session for auditing purposes.

iceberg_s3_role_arn string

The Amazon Resource Name (ARN) of the role to assume. If provided instead of s3_access_key_id and s3_secret_access_key, temporary credentials will be fetched by assuming this role

iceberg_s3_connect_timeout string

Configure socket connection timeout, in seconds (default: 60).

iceberg_gcs_project_id string

The Google Cloud project ID for GCS storage.

iceberg_gcs_credentials string

Base64-encoded Google Cloud service account credentials JSON for GCS storage.

iceberg_gcs_token string

OAuth2 token to use for GCS authentication.

iceberg_gcs_service_path string

Custom endpoint URL for GCS (for emulators or custom endpoints).

iceberg_gcs_no_auth string

Set to 'true' to allow anonymous access to GCS (for public buckets).

SpiceAiCatalogParams object
spiceai_api_key string
spiceai_token string
spiceai_endpoint string
spiceai_flight_endpoint string
spiceai_http_endpoint string
UnityCatalogCatalogParams object
unity_catalog_token string

The personal access token used to authenticate against the Unity Catalog API.

unity_catalog_aws_region string

The AWS region to use for S3 storage.

unity_catalog_aws_access_key_id string

The AWS access key ID to use for S3 storage.

unity_catalog_aws_secret_access_key string

The AWS secret access key to use for S3 storage.

unity_catalog_aws_endpoint string

The AWS endpoint to use for S3 storage.

unity_catalog_azure_storage_account_name string

The storage account to use for Azure storage.

unity_catalog_azure_storage_account_key string

The storage account key to use for Azure storage.

unity_catalog_azure_storage_client_id string

The service principal client id for accessing the storage account.

unity_catalog_azure_storage_client_secret string

The service principal client secret for accessing the storage account.

unity_catalog_azure_storage_sas_key string

The shared access signature key for accessing the storage account.

unity_catalog_azure_storage_endpoint string

The endpoint for the Azure Blob storage account.

unity_catalog_google_service_account string

Filesystem path to the Google service account JSON key file.

DatabricksCatalogParams object
databricks_endpoint string required

The endpoint of the Databricks instance.

databricks_token string

The personal access token used to authenticate against the DataBricks API.

mode string

The execution mode for querying against Databricks.

Default: "spark_connect"
client_timeout string

The timeout setting for object store client.

databricks_cluster_id string

The ID of the compute cluster in Databricks to use for the query. Only valid when mode is spark_connect.

databricks_use_ssl string

Use a TLS connection to connect to the Databricks Spark Connect endpoint.

Default: "true"
databricks_sql_warehouse_id string

The SQL Warehouse ID to use when 'mode' is set to 'sql_warehouse'

max_concurrent_requests string

Maximum number of concurrent HTTP requests to the SQL Warehouse API.

Default: "8"
http_max_retries string

Maximum number of HTTP-level retries for transient failures (429, 5xx).

Default: "3"
backoff_method string

Backoff strategy for transient HTTP retries.

Default: "fibonacci"
Values: "fibonacci" "exponential"
statement_max_retries string

Maximum number of poll retries when waiting for async statement completion.

Default: "14"
disable_on_permanent_error string

When true, non-retryable errors (401, 403, 404) permanently disable the connector to prevent a thundering herd of failed requests.

Default: "true"
databricks_client_id string

The client ID of the Databricks service principal.

databricks_client_secret string

The client secret of the Databricks service principal.

databricks_aws_region string

The AWS region to use for S3 storage.

databricks_aws_access_key_id string

The AWS access key ID to use for S3 storage.

databricks_aws_secret_access_key string

The AWS secret access key to use for S3 storage.

databricks_aws_endpoint string

The AWS endpoint to use for S3 storage.

databricks_azure_storage_account_name string

The storage account to use for Azure storage.

databricks_azure_storage_account_key string

The storage account key to use for Azure storage.

databricks_azure_storage_client_id string

The service principal client id for accessing the storage account.

databricks_azure_storage_client_secret string

The service principal client secret for accessing the storage account.

databricks_azure_storage_sas_key string

The shared access signature key for accessing the storage account.

databricks_azure_storage_endpoint string

The endpoint for the Azure Blob storage account.

databricks_google_service_account string

Filesystem path to the Google service account JSON key file.

OpenaiModelParams object
endpoint string

The OpenAI API base endpoint. Can be overridden to use a compatible provider (i.e. Nvidia NIM).

Default: "https://api.openai.com/v1"
openai_api_key string

The OpenAI API key.

openai_org_id string

The OpenAI organization ID.

openai_project_id string

The OpenAI project ID.

openai_usage_tier string

The current usage tier for the OpenAI account associated with the API key: 'free', 'tier1', 'tier2', 'tier3', 'tier4', or 'tier5'.

Default: "tier1"
Values: "free" "tier1" "tier2" "tier3" "tier4" "tier5"
responses_api string

Whether to enable use of this model via the Responses API. disabled by default.

Default: "disabled"
openai_responses_tools string

The OpenAI Responses tools to use when calling the model from the Responses API

Default: ""
tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

frequency_penalty string
logit_bias string
logprobs string
top_logprobs string
max_completion_tokens string
reasoning_effort string
store string
metadata string
n string
presence_penalty string
response_format string
seed string
stop string
stream string
stream_options string
temperature string
top_p string
tool_choice string
parallel_tool_calls string
user string
openai_frequency_penalty string

DEPRECATED: Use 'frequency_penalty' without prefix

openai_logit_bias string

DEPRECATED: Use 'logit_bias' without prefix

openai_logprobs string

DEPRECATED: Use 'logprobs' without prefix

openai_top_logprobs string

DEPRECATED: Use 'top_logprobs' without prefix

openai_max_completion_tokens string

DEPRECATED: Use 'max_completion_tokens' without prefix

openai_reasoning_effort string

DEPRECATED: Use 'reasoning_effort' without prefix

openai_store string

DEPRECATED: Use 'store' without prefix

openai_metadata string

DEPRECATED: Use 'metadata' without prefix

openai_n string

DEPRECATED: Use 'n' without prefix

openai_presence_penalty string

DEPRECATED: Use 'presence_penalty' without prefix

openai_response_format string

DEPRECATED: Use 'response_format' without prefix

openai_seed string

DEPRECATED: Use 'seed' without prefix

openai_stop string

DEPRECATED: Use 'stop' without prefix

openai_stream string

DEPRECATED: Use 'stream' without prefix

openai_stream_options string

DEPRECATED: Use 'stream_options' without prefix

openai_temperature string

DEPRECATED: Use 'temperature' without prefix

openai_top_p string

DEPRECATED: Use 'top_p' without prefix

openai_tools string

DEPRECATED: Use 'tools' without prefix

openai_tool_choice string

DEPRECATED: Use 'tool_choice' without prefix

openai_parallel_tool_calls string

DEPRECATED: Use 'parallel_tool_calls' without prefix

openai_user string

DEPRECATED: Use 'user' without prefix

AzureModelParams object
endpoint string

The Azure OpenAI resource endpoint, e.g., https://resource-name.openai.azure.com.

azure_api_version string

The API version used for the Azure OpenAI service.

azure_deployment_name string

The name of the model deployment.

azure_api_key string

The Azure OpenAI API key from the models deployment page.

azure_entra_token string

The Azure Entra token for authentication.

azure_openai_responses_tools string

Comma-separated list of OpenAI-hosted tools exposed via the Responses API for this model.

Default: ""
responses_api string

Whether to enable use of this model via the Responses API. disabled by default.

Default: "disabled"
tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

azure_frequency_penalty string
azure_logit_bias string
azure_logprobs string
azure_top_logprobs string
azure_max_completion_tokens string
azure_reasoning_effort string
azure_store string
azure_metadata string
azure_n string
azure_presence_penalty string
azure_response_format string
azure_seed string
azure_stop string
azure_stream string
azure_stream_options string
azure_temperature string
azure_top_p string
azure_tools string
azure_tool_choice string
azure_parallel_tool_calls string
azure_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

FileModelParams object
chat_template string

Customizes the transformation of OpenAI chat messages into a character stream for the model.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

file_frequency_penalty string
file_logit_bias string
file_logprobs string
file_top_logprobs string
file_max_completion_tokens string
file_reasoning_effort string
file_store string
file_metadata string
file_n string
file_presence_penalty string
file_response_format string
file_seed string
file_stop string
file_stream string
file_stream_options string
file_temperature string
file_top_p string
file_tools string
file_tool_choice string
file_parallel_tool_calls string
file_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

DatabricksModelParams object
databricks_endpoint string

The Databricks workspace endpoint, e.g., dbc-a12cd3e4-56f7.cloud.databricks.com.

databricks_token string

The Databricks API token to authenticate with the Databricks Models API.

databricks_client_id string

The Databricks Service Principal Client ID. Can't be used with databricks_token.

databricks_client_secret string

The Databricks Service Principal Client Secret. Can't be used with databricks_token.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

databricks_frequency_penalty string
databricks_logit_bias string
databricks_logprobs string
databricks_top_logprobs string
databricks_max_completion_tokens string
databricks_reasoning_effort string
databricks_store string
databricks_metadata string
databricks_n string
databricks_presence_penalty string
databricks_response_format string
databricks_seed string
databricks_stop string
databricks_stream string
databricks_stream_options string
databricks_temperature string
databricks_top_p string
databricks_tools string
databricks_tool_choice string
databricks_parallel_tool_calls string
databricks_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

HuggingfaceModelParams object
model_type string

The architecture to load the model as. Supported values: mistral, gemma, mixtral, llama, phi2, phi3, qwen2, gemma2, starcoder2, phi3.5moe, deepseekv2, deepseekv3

chat_template string

Customizes the transformation of OpenAI chat messages into a character stream for the model.

huggingface_token string

The Huggingface access token.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

huggingface_frequency_penalty string
huggingface_logit_bias string
huggingface_logprobs string
huggingface_top_logprobs string
huggingface_max_completion_tokens string
huggingface_reasoning_effort string
huggingface_store string
huggingface_metadata string
huggingface_n string
huggingface_presence_penalty string
huggingface_response_format string
huggingface_seed string
huggingface_stop string
huggingface_stream string
huggingface_stream_options string
huggingface_temperature string
huggingface_top_p string
huggingface_tools string
huggingface_tool_choice string
huggingface_parallel_tool_calls string
huggingface_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

AnthropicModelParams object
endpoint string

The Anthropic API base endpoint.

anthropic_api_key string

The Anthropic API key.

anthropic_auth_token string

The Anthropic Auth Token.

anthropic_usage_tier string

Anthropic usage tier (1-4). Used for rate limit defaults.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

anthropic_frequency_penalty string
anthropic_logit_bias string
anthropic_logprobs string
anthropic_top_logprobs string
anthropic_max_completion_tokens string
anthropic_reasoning_effort string
anthropic_store string
anthropic_metadata string
anthropic_n string
anthropic_presence_penalty string
anthropic_response_format string
anthropic_seed string
anthropic_stop string
anthropic_stream string
anthropic_stream_options string
anthropic_temperature string
anthropic_top_p string
anthropic_tools string
anthropic_tool_choice string
anthropic_parallel_tool_calls string
anthropic_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

XaiModelParams object
xai_api_key string

The xAI API key.

xai_usage_tier string

xAI usage tier (0-4). Used for rate limit defaults.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

xai_frequency_penalty string
xai_logit_bias string
xai_logprobs string
xai_top_logprobs string
xai_max_completion_tokens string
xai_reasoning_effort string
xai_store string
xai_metadata string
xai_n string
xai_presence_penalty string
xai_response_format string
xai_seed string
xai_stop string
xai_stream string
xai_stream_options string
xai_temperature string
xai_top_p string
xai_tools string
xai_tool_choice string
xai_parallel_tool_calls string
xai_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

BedrockModelParams object
aws_access_key_id string

The AWS access key ID to use for Bedrock models

aws_secret_access_key string

The AWS secret access key to use for Bedrock models

aws_session_token string

The AWS session token to use for Bedrock models.

aws_region string

The AWS region to use for Bedrock models.

aws_iam_role_source string

IAM role credential source. 'auto' uses the default AWS credential chain, 'metadata' uses only instance/container metadata (IMDS, ECS, EKS/IRSA), 'env' uses only environment variables.

Values: "auto" "metadata" "env"
bedrock_guardrail_identifier string

Identifier for the guardrail. Pattern: (([a-z0-9]+) | (arn:aws(-[^:]+)?:bedrock:[a-z0-9-]{1,20}:[0-9]{12}:guardrail/[a-z0-9]+)). Length: 0-2048.

bedrock_guardrail_version string

Guardrail version. Pattern: (([1-9][0-9]{0,7})|(DRAFT))

bedrock_trace string

Trace behavior for the guardrail. Valid values: enabled, disabled, enabled_full

Values: "enabled" "disabled" "enabled_full"
tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

bedrock_frequency_penalty string
bedrock_logit_bias string
bedrock_logprobs string
bedrock_top_logprobs string
bedrock_max_completion_tokens string
bedrock_reasoning_effort string
bedrock_store string
bedrock_metadata string
bedrock_n string
bedrock_presence_penalty string
bedrock_response_format string
bedrock_seed string
bedrock_stop string
bedrock_stream string
bedrock_stream_options string
bedrock_temperature string
bedrock_top_p string
bedrock_tools string
bedrock_tool_choice string
bedrock_parallel_tool_calls string
bedrock_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

GoogleModelParams object
google_api_key string

The Google Generative AI API key.

tools string

Which tools should be made available to the model. Set to 'auto' to use all available tools.

system_prompt string

An additional system prompt used for all chat completions to this model.

parameterized_prompt string
max_concurrency string

Maximum number of concurrent requests for this model. Overrides provider defaults.

requests_per_minute_limit string

Maximum requests per minute for this model. Overrides provider defaults.

google_frequency_penalty string
google_logit_bias string
google_logprobs string
google_top_logprobs string
google_max_completion_tokens string
google_reasoning_effort string
google_store string
google_metadata string
google_n string
google_presence_penalty string
google_response_format string
google_seed string
google_stop string
google_stream string
google_stream_options string
google_temperature string
google_top_p string
google_tools string
google_tool_choice string
google_parallel_tool_calls string
google_user string
openai_frequency_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logit_bias string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_logprobs string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_max_completion_tokens string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_reasoning_effort string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_store string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_metadata string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_n string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_presence_penalty string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_response_format string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_seed string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stop string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_stream_options string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_temperature string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_top_p string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tools string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_tool_choice string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_parallel_tool_calls string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

openai_user string

DEPRECATED: The openai_<param> language model overrides parameter is deprecated and will be removed in a future release. Please use <model_prefix>_<param> parameter name instead.

AbfsDataset object
from string required

Data source path for abfs connector. Format: abfs:

pattern=^abfs:
name string required
params AbfsDataConnectorParams | null

Connection parameters for the abfs data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
AbfssDataset object
from string required

Data source path for abfss connector. Format: abfss:

pattern=^abfss:
name string required
params AbfssDataConnectorParams | null

Connection parameters for the abfss data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
DebeziumDataset object
from string required

Data source path for debezium connector. Format: debezium:

pattern=^debezium:
name string required

Connection parameters for the debezium data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
DynamodbDataset object
from string required

Data source path for dynamodb connector. Format: dynamodb:

pattern=^dynamodb:
name string required

Connection parameters for the dynamodb data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
FileDataset object
from string required

Data source path for file connector. Format: file:

pattern=^file:
name string required
params FileDataConnectorParams | null

Connection parameters for the file data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GcsDataset object
from string required

Data source path for gcs connector. Format: gcs:

pattern=^gcs:
name string required
params GcsDataConnectorParams | null

Connection parameters for the gcs data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GitDataset object
from string required

Data source path for git connector. Format: git:

pattern=^git:
name string required
params GitDataConnectorParams | null

Connection parameters for the git data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GithubDataset object
from string required

Data source path for github connector. Format: github:

pattern=^github:
name string required

Connection parameters for the github data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GlueDataset object
from string required

Data source path for glue connector. Format: glue:

pattern=^glue:
name string required
params GlueDataConnectorParams | null

Connection parameters for the glue data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GsDataset object
from string required

Data source path for gs connector. Format: gs:

pattern=^gs:
name string required
params GsDataConnectorParams | null

Connection parameters for the gs data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
HttpDataset object
from string required

Data source path for http connector. Format: http:

pattern=^http:
name string required
params HttpDataConnectorParams | null

Connection parameters for the http data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
HttpsDataset object
from string required

Data source path for https connector. Format: https:

pattern=^https:
name string required
params HttpsDataConnectorParams | null

Connection parameters for the https data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
IcebergDataset object
from string required

Data source path for iceberg connector. Format: iceberg:

pattern=^iceberg:
name string required

Connection parameters for the iceberg data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
KafkaDataset object
from string required

Data source path for kafka connector. Format: kafka:

pattern=^kafka:
name string required
params KafkaDataConnectorParams | null

Connection parameters for the kafka data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
LocalpodDataset object
from string required

Data source path for localpod connector. Format: localpod:

pattern=^localpod:
name string required

Connection parameters for the localpod data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
MemoryDataset object
from string required

Data source path for memory connector. Format: memory:

pattern=^memory:
name string required

Connection parameters for the memory data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
S3Dataset object
from string required

Data source path for s3 connector. Format: s3:

pattern=^s3:
name string required
params S3DataConnectorParams | null

Connection parameters for the s3 data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
SinkDataset object
from string required

Data source path for sink connector. Format: sink:

pattern=^sink:
name string required
params SinkDataConnectorParams | null

Connection parameters for the sink data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
SpiceAiDataset object
from string required

Data source path for spice.ai connector. Format: spice.ai:

pattern=^spice\.ai:
name string required

Connection parameters for the spice.ai data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
ArrowAcceleratedDataset object

Dataset with arrow acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "arrow"
params ArrowAcceleratorParams | null

Configuration parameters for the arrow acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
CayenneAcceleratedDataset object

Dataset with cayenne acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "cayenne"
params CayenneAcceleratorParams | null

Configuration parameters for the cayenne acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
DuckdbAcceleratedDataset object

Dataset with duckdb acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "duckdb"
params DuckdbAcceleratorParams | null

Configuration parameters for the duckdb acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
PostgresAcceleratedDataset object

Dataset with postgres acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "postgres"

Configuration parameters for the postgres acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
SqliteAcceleratedDataset object

Dataset with sqlite acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "sqlite"
params SqliteAcceleratorParams | null

Configuration parameters for the sqlite acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
TursoAcceleratedDataset object

Dataset with turso acceleration engine.

from string required
name string required
acceleration object
2 nested properties
engine string
Constant: "turso"
params TursoAcceleratorParams | null

Configuration parameters for the turso acceleration engine.

description string | null
metadata object
columns Column[]
access string | string
params Params | null
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
IcebergCatalog object
from string required

Catalog source for iceberg connector. Format: iceberg:<catalog_path>

pattern=^iceberg:
name string required
params IcebergCatalogParams | null

Connection parameters for the iceberg catalog connector.

description string | null
metadata object
access string | string
include string[]
dataset_params Params | null
dependsOn string[]
metrics Metrics | null
mode string | string
SpiceAiCatalog object
from string required

Catalog source for spice.ai connector. Format: spice.ai:<catalog_path>

pattern=^spice\.ai:
name string required
params SpiceAiCatalogParams | null

Connection parameters for the spice.ai catalog connector.

description string | null
metadata object
access string | string
include string[]
dataset_params Params | null
dependsOn string[]
metrics Metrics | null
mode string | string
UnityCatalogCatalog object
from string required

Catalog source for unity_catalog connector. Format: unity_catalog:<catalog_path>

pattern=^unity_catalog:
name string required

Connection parameters for the unity_catalog catalog connector.

description string | null
metadata object
access string | string
include string[]
dataset_params Params | null
dependsOn string[]
metrics Metrics | null
mode string | string
DatabricksCatalog object
from string required

Catalog source for databricks connector. Format: databricks:<catalog_path>

pattern=^databricks:
name string required
params DatabricksCatalogParams | null

Connection parameters for the databricks catalog connector.

description string | null
metadata object
access string | string
include string[]
dataset_params Params | null
dependsOn string[]
metrics Metrics | null
mode string | string
OpenaiModel object
from string required

Model source for openai provider. Format: openai:<model_id>

pattern=^openai:
name string required
params OpenaiModelParams | null

Configuration parameters for the openai model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
AzureModel object
from string required

Model source for azure provider. Format: azure:<model_id>

pattern=^azure:
name string required
params AzureModelParams | null

Configuration parameters for the azure model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
FileModel object
from string required

Model source for file provider. Format: file:<model_id>

pattern=^file:
name string required
params FileModelParams | null

Configuration parameters for the file model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
DatabricksModel object
from string required

Model source for databricks provider. Format: databricks:<model_id>

pattern=^databricks:
name string required
params DatabricksModelParams | null

Configuration parameters for the databricks model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
HuggingfaceModel object
from string required

Model source for huggingface provider. Format: huggingface:<model_id>

pattern=^huggingface:
name string required
params HuggingfaceModelParams | null

Configuration parameters for the huggingface model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
AnthropicModel object
from string required

Model source for anthropic provider. Format: anthropic:<model_id>

pattern=^anthropic:
name string required
params AnthropicModelParams | null

Configuration parameters for the anthropic model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
XaiModel object
from string required

Model source for xai provider. Format: xai:<model_id>

pattern=^xai:
name string required
params XaiModelParams | null

Configuration parameters for the xai model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
BedrockModel object
from string required

Model source for bedrock provider. Format: bedrock:<model_id>

pattern=^bedrock:
name string required
params BedrockModelParams | null

Configuration parameters for the bedrock model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
GoogleModel object
from string required

Model source for google provider. Format: google:<model_id>

pattern=^google:
name string required
params GoogleModelParams | null

Configuration parameters for the google model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null
GenericDataset object

Generic dataset for custom or unknown connectors.

from string required
name string required
params Params | null

Connection parameters for the data connector.

description string | null
metadata object
columns Column[]
access string | string
has_metadata_table boolean | null
replication Replication | null
time_column string | null
time_format TimeFormat | null
time_partition_column string | null
time_partition_format TimeFormat | null
acceleration Acceleration | null
dependsOn string[]
invalid_type_action InvalidTypeAction | null
unsupported_type_action UnsupportedTypeAction | null
ready_state string | string | string

Controls when the dataset is marked ready for queries.

metrics Metrics | null
vectors VectorStore | null
check_availability string | string

Controls whether the federated table periodically has its availability checked.

mode string | string
GenericCatalog object

Generic catalog for custom or unknown connectors.

from string required
name string required
params Params | null

Connection parameters for the catalog connector.

description string | null
metadata object
access string | string
include string[]
dataset_params Params | null
dependsOn string[]
metrics Metrics | null
mode string | string
GenericModel object

Generic model for custom or unknown model sources.

from string required
name string required
params Params | null

Configuration parameters for the model provider.

description string | null
metadata object
files ModelFile[]
datasets string[]
dependsOn string[]
metrics Metrics | null