Semantic Data Fabric (SDF) file 1.1 (SchemaStore) JSON Schema

Workspace object

A workspace is a collection of one or more catalogs, schemas, tables, and resources, called workspace members, that are managed together.

edition string required

The SDF edition, should always be 1 (for now)

name string required

The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.

description string

A description of this workspace

repository string

The URL of the workspace source repository (defaults to 'none' if no repository is given)

remote-location string

The default output object store location, e.g. 's3://bucket/key/' where key is optional

includes IncludePath[]

An array of directories and filenames containing .sql and .sdf.yml files

excludes ExcludePath[]

An array of directories and filenames to be skipped when resolving includes

references WorkspacePath[]

An array of paths to other workspaces, i.e. .sql and .yml files Todo why do we skip this serialization?

dialect

The dialect of this workspace. If not set, defaults to Presto dialect

All of: Dialect string

compute

The compute platform for this workspace. If not set, defaults to Local

All of: ComputeKind string

default-catalog string

Defines a default catalog (If not set, defaults to the workspace name)

default-schema string

Defines a default schema (If not set, defaults to 'pub')

Default: "pub"

default-profile string

Defines the default profile (if not set, defaults to 'dbg')

Default: "dbg"

source-locations FilePath[]

Workspace defined by these set of files

vars Record<string, null | boolean | integer | number | string>

A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, an in Jinja as {{ dt }}

preprocessor Preprocessor | null

Experimental: This project has jinja, sql_vars, and sql_macros

default-severity

The default severity for this tables tests and checks

All of: Severity string

is-dbt-project boolean | null

Experimental: This is a dbt project..

IncludePath object

All file path should either be relative to the workspace, or absolute for an object store like AWS s3:// Note that an [IncludePath] specifies a catalog and schema scope for unqualified names (in effect for both creating and querying tables). See [IncludePath::default_catalog] and [IncludePath::default_schema].

path string required

A filepath

type

Type of included artifacts: model | test | stats | metadata | resource

default-catalog string | null

Defines a default catalog for unqualified names. If not set, defaults to the [Workspace] catalog.

default-schema string | null

Defines a default schema for unqualified names. If not set, defaults to the [Workspace] schema.

dialect Dialect | null

The dialect of the included files. If not set, defaults to the [Workspace] dialect.

compute ComputeKind | null

The compute platform for building the included files. If not set, defaults to the [Workspace] compute platform.

index

Index method for this include path: scan | table | schema-table | catalog-schema-table

All of: IndexMethod string

sync

Synchronization schema for this include path: always | on-pull | on-push | never

All of: SyncType string | string | string | string

Dialect string

Supported dialects

ComputeKind string

Supported compute platforms

IndexMethod string

SyncType string | string | string | string

ExcludePath object

path string required

A filepath

exclude-type ExcludeType | null

Type of excluded artifacts

ExcludeType string | string

WorkspacePath object

path string required

The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace

profile string | null

The chosen workspace profile (none means default)

git string | null

The Git repo

rev string | null

the Git revision (choose only one of the fields: rev, branch, tag)

branch string | null

the Git branch (choose only one of the fields: rev, branch, tag)

tag string | null

the Git tag (choose only one of the fields: rev, branch, tag)

FilePath object

All file path should either be relative to the workspace, or absolute for an object store like AWS s3://

path string required

A filepath

time SystemTime | null

Last modified of the file

SystemTime object

secs_since_epoch integer required

format=uint64min=0.0

nanos_since_epoch integer required

format=uint32min=0.0

Constant null | boolean | integer | number | string

Preprocessor string

Severity string

Supported dialects

Profile object

Profiles provide a way to override the fields of a workspace, i.e. if a profile has set field X, then the workspace field X will be overridden by the profile field X.

name string required

The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.

description string

A description of this workspace

repository string

The URL of the workspace source repository (defaults to 'none' if no repository is given)

remote-location string

The default output object store location, e.g. 's3://bucket/key/' where key is optional

includes IncludePath[]

An array of directories and filenames containing .sql and .sdf.yml files

excludes ExcludePath[]

An array of directories and filenames to be skipped when resolving includes

references WorkspacePath[]

An array of paths to other workspaces, i.e. .sql and .yml files

dialect

The dialect of this profile. If not set, defaults to Presto dialect

All of: Dialect string

compute

The compute platform for this profile. If not set, defaults to Local

All of: ComputeKind string

default-catalog string

Defines a default catalog (If not set, defaults to the workspace name)

default-schema string

Defines a default schema (If not set, defaults to 'pub')

Default: "pub"

vars Record<string, null | boolean | integer | number | string>

A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, an in Jinja as {{ dt }}

default-profile string

Defines the default profile (if not set, defaults to 'dbg')

Default: "dbg"

source-locations FilePath[]

Workspace defined by these set of files

preprocessor Preprocessor | null

Experimental: This project has jinja

default-severity

Experimental: This is a dbt project.. The default severity for this tables tests and checks

All of: Severity string

is-dbt-project boolean | null

Table object

A tables is either defined by given data (also called external table) or defined via a query.

name string required

description string | null

dialect Dialect | null

The dialect of this table, defaults to presto

compute ComputeKind | null

The compute platform for evaluating the query populating this table, defaults to local

table-type TableType | null

The table-type of this table

dependencies string[]

ALl table dependencies (syntax: catalog.schema.table)

columns Column[]

The columns of the schema: name, type, metadata

partitioned-by Partition[]

The partitioning format of the table

default-severity

The default severity for this tables tests and checks

All of: Severity string

constraints Constraint[]

schedule string

The schedule of the table [expressed as cron]

starting string

The first date of the table [expressed by prefixes of RFC 33]

classifiers string[]

An array of classifier references

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

lineage Lineage | null

Lineage, a tagged array of column references

location string | null

Data is at this location

file-format FileFormat | null

Store table in this format [only for external tables]

with-header boolean | null

CSV data has a header [only for external tables]

delimiter string | null

CSV data is separated by this delimiter [only for external tables]

compression CompressionType | null

Json or CSV data is compressed with this method [only for external tables]

source-locations FilePath[]

Table is defined by these .sql and/or .sdf files

Column object

name string required

The name of the column

description string

A description of this column

datatype string | null

The type of this column

classifiers string[]

An array of classifier references

lineage Lineage | null

Lineage, a tagged array of column references

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

samples string[]

An array of representative literals of this column [experimental!]

default-severity

The default severity for this tables tests and checks

All of: Severity string

constraints Constraint[]

Lineage object

copy string[]

The output column is computed by copying these upstream columns

modify string[]

The output column is computed by transforming these upstream columns

scan string[]

These upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)

apply string[]

These functions were used to produce the output column

Reclassify object

to string | null

Target classifier

from string | null

Expected source classifier

Constraint object

test string required

The type of the constraint

name string

The name of the constraint

severity

The columns that the constraint applies to

All of: Severity string

Partition object

A partition is a table column, used to describe to which partition this row belongs to

name string required

The name of the partition column

description string | null

A description of the partition column

format string | null

The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]

FileFormat string

Store table data in these formats

CompressionType string | string | string | string

Compress table data using these methods

Classifier object

A classifier defines the labels that can be attached to columns or a table.

name string required

The name of the classifier type

description string

A description of this classifier type

labels Label[]

Named classifier labels

scope

Scope of the classifier: table or column

All of: Scope string

cardinality

Cardinality of the classifier: zero-or-one, one or zero-or-many

All of: Cardinality string

propagate boolean

Does the classifier propagate from scope to scope or is it a one scope marker

source-locations FilePath[]

Classifier defined by these set of .sdf files

Label object

A classifier element is a scoped classifier label (e.g. the element PII belongs to the classifier scope data)

name string required

The name of the label, use "*" to allow arbitrary strings as labels

description string | null

A description of this classifier element

Scope string

Cardinality string

Function object

A function block defines the signature for user defined

name string required

The name of the function [syntax: [[catalog.]schema].function]

section string

The generic type bounds

dialect Dialect | null

The dialect that provides this function

description string

A description of this function

variadic

Arbitrary number of arguments of an common type out of a list of valid types

All of: Variadic string | string | string | string

kind

The function kind

All of: FunctionKind string

parameters array | null

The arguments of this function

optional-parameters array | null

The arguments of this function

returns Parameter | null

The results of this function (can be a tuple)

binds TypeBound[]

The generic type bounds

volatility

volatility - The volatility of the function.

All of: Volatility string | string | string

examples Example[]

example - Example use of the function (tuple with input/output)

cross-link string

cross-link - link to existing documentation, for example: https://prestodb.io/docs/current/functions/datetime.html#truncation-function

reclassify Reclassify[]

Array of reclassify instructions for changing the attached classifier labels

source-locations FilePath[]

Function defined by these set of .sdf files

implemented-by FunctionImplSpec | null

special boolean

Function can be called without parentheses, e.g. as if it were a constant, e.g. current_date

Variadic string | string | string | string

Arbitrary number of arguments of an common type out of a list of valid types

FunctionKind string

Parameter object

A function parameter

name string | null

The name of the parameter

description string | null

A description of this parameter

datatype string | null

The datatype of this parameter

classifier array | null

An array of classifier references

constant string | null

The required constant value of this parameter

identifiers array | null

The parameter may appear as identifier, without quote

OptionalParameter object

A function parameter

name string required

The name of the parameter

datatype string required

The datatype of this parameter

description string | null

A description of this parameter

classifier array | null

An array of classifier references

constant string | null

The required constant value of this parameter

identifiers array | null

The parameter may appear as identifier, without quote

TypeBound object

type-variable string required

datatypes string[] required

Volatility string | string | string

A function's volatility, which defines the functions eligibility for certain optimizations

Example object

input string required

The sql string corresponding to the input of this example

output string required

The output corresponding to running the input string

FunctionImplSpec string

Plugin object

A function block defines the signature for user defined

name string required

The name of the plugin [e.g.: pyspark]

includes PluginIncludePath[] required

An array of directories and filenames containing files processed by this plugin

image-uri string | null

Image URI of the plugin [e.g.: docker.io/sdf/pyspark:latest]

dockerfile string | null

Path to the dockerfile of the plugin [e.g.: dockerfile]

keep-alive boolean | null

Whether to keep the plugin container alive after execution

PluginIncludePath object

path string required

A filepath

kind PluginIncludePathKind | null

Type of Plugin Path (Default: queries)

time SystemTime | null

Last modified of the file

PluginIncludePathKind string

Provider object

Table providers manage tables in in catalogs

sources string[] required

A list of sources, backed by the table provider, source can use globs, e.g. catalog.schema.*

type required

The type of the catalog [e.g.: hive]

All of: CatalogType string

name string

The name of the table provider

warehouse string | null

The snowflake warehouse (defaults: the warehouse that was given at sdf Auth)

cluster-identifier string | null

The cluster identifier for redshift server

source-locations FilePath[]

batch-size integer | null

format=uintmin=0.0

CatalogType string

Config object

A configuration with section name and properties

name string required

The name of the configuration section

description string | null

A description of this configuration section

properties Record<string, string>

Semantic Data Fabric (SDF) file

Schema URL

Properties

Definitions