Semantic Data Fabric (SDF) file 1.3 (SchemaStore) JSON Schema

Workspace object

A workspace is a collection of one or more catalogs, schemas, tables, and resources, called workspace members, that are managed together.

edition string required

The SDF edition, should always be 1.3 (1.2 is deprecated)

name string required

The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.

is-tablename-case-sensitive boolean

Whether table names are treated as case sensitive in this workspace Applies globally to the entire project, and is non-overridable. Defaults to false

Note: this setting is only effective when set on the root workspace file, if set on included workspaces it will be ignored

description string

A description of this workspace

repository string

The URL of the workspace source repository (defaults to 'none' if no repository is given)

includes IncludePath[]

An array of directories and filenames containing .sql and .sdf.yml files

excludes ExcludePath[]

An array of directories and filenames to be skipped when resolving includes

dependencies Dependency[]

Dependencies of the workspace to other workspaces or to cloud database providers

integrations Integration[]

The integrations for this environment

defaults Defaults | null

Defaults for this workspace

source-locations FilePath[]

Workspace defined by these set of files

vars Record<string, null | boolean | integer | number | string>

A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as

dbt DbtConfig | null

Configuration for dbt integration

IncludePath object

All file path should either be relative to the workspace, or absolute for an object store like AWS s3:// Note that an [IncludePath] specifies a catalog and schema scope for unqualified names (in effect for both creating and querying tables). See [IncludePath::default_catalog] and [IncludePath::default_schema].

path string required

A filepath

type

Type of included artifacts: model | test | stats | metadata | resource

index

Index method for this include path: scan | table | schema-table | catalog-schema-table

All of: IndexMethod string | string | string | string | string

defaults Defaults | null

Defaults for files on this path

IndexMethod string | string | string | string | string

Defaults object

environment String | null

The default environment (can only be set on the level of the workspace)

dialect Dialect | null

The dialect of this environment. If not set, defaults to trino

casing-policy CasingPolicy | null

Case normalization policy for names

preprocessor PreprocessorType | null

The preprocessor for this environment. If not set, defaults to local

catalog String | null

Defines a default catalog. If not set, defaults to the (catalog/workspace) name in an outer scope

schema String | null

Defines a default schema, If not set, defaults to the schema name in an outer scope, if not set, defaults to 'pub'

materialization Materialization | null

Defines the default materialization, if not set defaults to materialization in outer scope, if not set defaults to base-table

creation-flag TableCreationFlags | null

Defines table creation flags, defaults to if not set

utils-lib string | null

The default utils library, if set overrides sdf_utils

test-lib string | null

The default test library, if set overrides sdf_test

materialization-lib string | null

The default materialization library, if set overrides sdf_materialization

linter string | null

The named lint rule set, uses defaults (from sdftarget//lint.sdf.yml) if not set

index-method IndexMethod | null

The default index for this tables

include-type IncludeType | null

The default index for this tables

sync-method SyncType | null

The default index for this tables

severity Severity | null

The default severity for this tables tests and checks

csv-has-header boolean | null

CSV data has a header [only for external tables]

csv-delimiter string | null

CSV data is separated by this delimiter [only for external tables]

csv-compression CompressionType | null

Json or CSV data is compressed with this method [only for external tables]

String string

Dialect string

Supported dialects in YAML files

Note: this [Dialect] type is only meant for serializing to and from .sdf.yml files. For internal use, you should almost always use the semantic dialect type ([types::Dialect]) instead.

Note specifically that the lack of a .to_string() method (i.e. [!Display]) on this type is intentional -- you must first convert this type to a [types::Dialect], by using either [Dialect::to_semantic_dialect] or [types::Dialect::from], before you can convert it to a string.

CasingPolicy string

PreprocessorType string

TableCreationFlags string | string | string | string | string

SyncType string | string | string | string

Severity string

CompressionType string | string | string | string

Compress table data using these methods

ExcludePath object

path string required

A filepath

exclude-type ExcludeType | null

Type of excluded artifacts

ExcludeType string | string

Dependency object

name string required

path string | null

The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace

environment string | null

The chosen workspace environment (none means default)

target string | null

The chosen workspace target (none means default)

git string | null

The Git repo

rev string | null

the Git revision (choose only one of the fields: rev, branch, tag)

branch string | null

the Git branch (choose only one of the fields: rev, branch, tag)

tag string | null

the Git tag (choose only one of the fields: rev, branch, tag)

imports string[]

Which models, reports, tests, checks etc. to include from the dependency

Integration object

Table providers manage tables in in catalogs (OLD_PROVIDERS)

provider required

The type of the provider [e.g.: snowflake, redshift, s3]

All of: ProviderType string

type

The type of the integration [e.g.: database, metadata, data]

Default: "database"

All of: IntegrationType string

credential string | null

Credential identifier for this provider

cluster-identifier string | null

The cluster identifier for redshift server

batch-size integer | null

The size of the batch when querying the provider

format=uintmin=0.0

sources SourcePattern[]

A list of (possibly remote) sources to read, matched in order, so write specific pattern before more general patterns

targets TargetPattern[]

A list of (possibly remote) targets to build, matched in order, so write specific pattern before more general patterns, source patterns are excluded

buckets DataBucket[]

A list of remote buckets to target

output-location string | null

The remote output location of the integration

IntegrationType string

ProviderType string

SourcePattern object

Sources defines the tables that are possibly remote

pattern string required

A source that can be read. Sources must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope

time-travel-qualifier string | null

Time travel qualifier expression (e.g. `AT (TIMESTAMP => {{ SOME_TIMESTAMP }})``)

preload boolean | null

Whether to preload the source

rename-from string | null

Renames sources when searching in the remote, the ith ${i} matches the ith * of the name, so to prepend all catalogs,schema,table with , use "${1}.${2}.${3}"

TargetPattern object

Sources define the tables that are possibly remote and that will be build

pattern string | null

A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope

patterns string[]

A list of patterns. A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope

preload boolean | null

Whether to preload the target

rename-as string | null

Renames targets, the ith ${i} matches the ith * of the name, so to prepend all catalogs,schema,table with , use "${1}.${2}.${3}"

DataBucket object

uri string required

The uri of the bucket

region string | null

The region of the bucket

FilePath object

All file path should either be relative to the workspace, or absolute for an object store like AWS s3://

path string required

A filepath

time SystemTime | null

Last modified of the file

SystemTime object

secs_since_epoch integer required

format=uint64min=0.0

nanos_since_epoch integer required

format=uint32min=0.0

Constant null | boolean | integer | number | string

DbtConfig object

enabled boolean

Whether the dbt integration is enabled

project-dir string | null

Dbt Project Directory

target-dir string | null

The dbt target directory for the project

profile-dir string | null

The directory where dbt profiles are stored

profile String | null

The dbt profile to use

target String | null

The dbt target for the profile

auto-parse boolean

Automatically Run Parse in-between commands (default: true)

disable-introspection boolean

Disable introspection (default: false)

Environment object

Environments provide a way to override the fields of a workspace, i.e. if an environment has set field X, then the workspace field X will be overridden by the environment field X.

name required

The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.

All of: String string

description string

A description of this workspace

repository string

The URL of the workspace source repository (defaults to 'none' if no repository is given)

includes IncludePath[]

An array of directories and filenames containing .sql and .sdf.yml files

excludes ExcludePath[]

An array of directories and filenames to be skipped when resolving includes

defaults Defaults | null

Defaults for this workspace

dependencies Dependency[]

Dependencies of the workspace to other workspaces or to cloud database providers

integrations Integration[]

The integrations for this environment

vars Record<string, null | boolean | integer | number | string>

A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as

source-locations FilePath[]

Workspace defined by these set of files

preprocessor PreprocessorType | null

Experimental: This project has jinja

dbt DbtConfig | null

Configuration for dbt integration

Table object

A tables is either defined by given data (also called external table) or defined via a query.

name required

The name of the table (syntax: [[catalog.]schema].table)

Note: this field is typed as a [QualifiedName] for serialization. In almost all cases you should use the [Self::fqn()] method to get the fully qualified [TableName] instead of accessing this field directly.

All of: String string

description string | null

A description of this table

dialect Dialect | null

The dialect of this table, defaults to trino

casing-policy CasingPolicy | null

Case normalization policy for names specified in this table

materialization Materialization | null

The table-type of this table (new version)

warehouse string | null

The warehouse where this table is computed

purpose TablePurpose | null

Specify what kind of table or view this is

origin

The origin of this table or

All of: TableOrigin string

exists-remotely boolean | null

Whether the table exists in the remote DB (used for is_incremental macro)

table-location TableLocation | null

Specify table ,location, defaults to none if not set

creation-flags TableCreationFlags | null

Defines the table creation options, defaults to none if not set

incremental-options IncrementalOptions | null

Options governing incremental table evaluation (only for incremental tables)

snapshot-options SnapshotOptions | null

Options governing snapshot table evaluation (only for snapshot tables)

dependencies String[]

All tables that this table depends on (syntax: catalog.schema.table)

depended-on-by String[]

All tables that depend on this table (syntax: catalog.schema.table)

columns Column[]

The columns of the schema: name, type, metadata

partitioned-by Partition[]

The partitioning format of the table

severity Severity | null

The default severity for this tables tests and checks

tests Constraint[]

schedule string

The schedule of the table [expressed as cron]

starting string

The first date of the table [expressed by prefixes of RFC 33]

classifiers string[]

An array of classifier references

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

lineage Lineage | null

Lineage, a tagged array of column references

location string | null

Data is at this location

file-format FileFormat | null

Store table in this format [only for external tables]

with-header boolean | null

CSV data has a header [only for external tables]

delimiter string | null

CSV data is separated by this delimiter [only for external tables]

compression CompressionType | null

Json or CSV data is compressed with this method [only for external tables]

cycle-cut-point boolean | null

If this table is part of a cyclic dependency then cut the cycle here

source-locations FilePath[]

Table is defined by these .sql and/or .sdf files

sealed boolean | null

This table is either backed by a create table ddl or by a table definition in yml that is the table's complete schema

meta Record<string, string>

Metadata for this table

TableOrigin string

TableLocation string

IncrementalOptions object

strategy required

Incremental strategy; may be one of append, merge, or delete+insert

All of: IncrementalStrategy string

unique-key string | null

Expression used for identifying records in Merge and Delete+Insert strategies; May be a column name or an expression combining multiple columns. If left unspecified, Merge and Delete+Insert strategies behave the same as Append

merge-update-columns String[]

List of column names to be updated as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified

merge-exclude-columns String[]

List of column names to exclude from updating as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified

on-schema-change OnSchemaChange | null

Method for reacting to changing schema in the source of the incremental table Possible values are fail, append, and sync. If left unspecified, the default behavior is to ignore the change and possibly error out if the schema change is incompatible. fail causes a failure whenever any deviation in the schema of the source is detected; append adds new columns but does not delete the columns removed from the source; sync adds new columns and deletess the columns removed from the source;

compact-mode-warehouse string | null

Warehouse to use in the incremental mode

IncrementalStrategy string

OnSchemaChange string | string | string

SnapshotOptions object

strategy required

Snapshot strategy; may be one of timestamp (default), or check

All of: SnapshotStrategy string

unique-key string required

Expression used for identifying records that will be updated according to the snapshot strategiy; May be a column name or an expression combining multiple columns

updated-at String | null

Name of the timestamp column used to identify the last update time This option is only required for the timestamp snapshot strategy

check-cols CheckColsSpec | null

Specification of which columns to check for change (may be a list of column names or all) This option is only required for the check snapshot strategy

compact-mode-warehouse string | null

Warehouse to use in the snapshot mode

SnapshotStrategy string

CheckColsSpec object | string

Column object

name required

The name of the column

All of: String string

description string

A description of this column

datatype string | null

The type of this column

classifiers string[]

An array of classifier references

lineage Lineage | null

Lineage, a tagged array of column references

forward-lineage Lineage | null

Forward Lineage, the columns that this column is used to compute

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

samples string[]

An array of representative literals of this column [experimental!]

default-severity

The default severity for this tables tests and checks

All of: Severity string

tests Constraint[]

Lineage object

copy String[]

The output column is computed by copying these upstream columns

modify String[]

The output column is computed by transforming these upstream columns

scan String[]

These upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)

apply String[]

These functions were used to produce the output column

Reclassify object

to string | null

Target classifier

from string | null

Expected source classifier

Constraint object

expect string required

The constraint macro: must have the form lib.macro(args,..), where lib is any of the libs in scope, std is available by default

severity Severity | null

The severity of this constraint

Partition object

A partition is a table column, used to describe to which partition this row belongs to

name required

The name of the partition column

All of: String string

description string | null

A description of the partition column

format string | null

The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]

FileFormat string

Store table data in these formats

Classifier object

A classifier defines the labels that can be attached to columns or a table.

name string required

The name of the classifier type

description string

A description of this classifier type

labels Label[]

Named classifier labels

scope

Scope of the classifier: table or column

All of: Scope string

cardinality

Cardinality of the classifier: zero-or-one, one or zero-or-many

All of: Cardinality string

propagate boolean

Does the classifier propagate from scope to scope or is it a one scope marker

source-locations FilePath[]

Classifier defined by these set of .sdf files

Label object

A classifier element is a scoped classifier label (e.g. the element PII belongs to the classifier scope data)

name string required

The name of the label, use "*" to allow arbitrary strings as labels

description string | null

A description of this classifier element

Scope string

Cardinality string

Function object

A function block defines the signature for user defined

name required

The name of the function [syntax: [[catalog.]schema].function]

All of: String string

section string

The function category

dialect Dialect | null

The dialect that provides this function

description string

A description of this function

variadic

Arbitrary number of arguments of an common type out of a list of valid types

All of: Variadic string | string | string | string

kind

The function kind

All of: FunctionKind string

parameters array | null

The arguments of this function

optional-parameters array | null

The arguments of this function

returns Parameter | null

The results of this function (can be a tuple)

constraints Record<string, string>

The constraints on generic type bounds

binds TypeBound[]

The generic type bounds

volatility

volatility - The volatility of the function.

All of: Volatility string | string | string

examples Example[]

example - Example use of the function (tuple with input/output)

cross-link string

cross-link - link to existing documentation, for example: https://trino.io/docs/current/functions/datetime.html#date_trunc

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

source-locations FilePath[]

Function defined by these set of .sdf files

implemented-by FunctionImplSpec | null

special boolean

Function can be called without parentheses, e.g. as if it were a constant, e.g. current_date

Variadic string | string | string | string

Arbitrary number of arguments of an common type out of a list of valid types

FunctionKind string

Parameter object

A function parameter

name String | null

The name of the parameter

description string | null

A description of this parameter

datatype string | null

The datatype of this parameter

classifier array | null

An array of classifier references

constant string | null

The required constant value of this parameter

identifiers array | null

The parameter may appear as identifier, without quote

OptionalParameter object

A function parameter

name required

The name of the parameter

All of: String string

datatype string required

The datatype of this parameter

description string | null

A description of this parameter

classifier array | null

An array of classifier references

constant string | null

The required constant value of this parameter

identifiers array | null

The parameter may appear as identifier, without quote

TypeBound object

type-variable string required

datatypes string[] required

Volatility string | string | string

A function's volatility, which defines the functions eligibility for certain optimizations

Example object

input string required

The sql string corresponding to the input of this example

output string required

The output corresponding to running the input string

FunctionImplSpec string | object | object | string

Indicates how a function's evaluation is implemented.

RustFunctionSpec object

name string | null

The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.

DataFusionSpec object

udf String | null

The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.

Config object

A configuration with section name and properties

name string required

The name of the configuration section

description string | null

A description of this configuration section

properties Record<string, string>

Credential object

name string required

The name of the credential (default = 'default')

description string | null

A description of this credential

source-locations array | null

Credential defined by these set of .sdf files

SdfAuthVariant string

HeadlessCredentials object

access_key string required

secret_key string required

Semantic Data Fabric (SDF) file

Schema URL

Properties

Definitions