Semantic Data Fabric (SDF) file
1.3Schema URL
Properties
A workspace definition
A environment definition
A table definition
A classifier definition
A function definition
A config definition
A credential definition
A config definition
Definitions
A workspace is a collection of one or more catalogs, schemas, tables, and resources, called workspace members, that are managed together.
The SDF edition, should always be 1.3 (1.2 is deprecated)
The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
Whether table names are treated as case sensitive in this workspace Applies globally to the entire project, and is non-overridable. Defaults to false
Note: this setting is only effective when set on the root workspace file, if set on included workspaces it will be ignored
A description of this workspace
The URL of the workspace source repository (defaults to 'none' if no repository is given)
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
Dependencies of the workspace to other workspaces or to cloud database providers
The integrations for this environment
Defaults for this workspace
Workspace defined by these set of files
A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as
Configuration for dbt integration
All file path should either be relative to the workspace, or absolute for an object store like AWS s3:// Note that an [IncludePath] specifies a catalog and schema scope for unqualified names (in effect for both creating and querying tables). See [IncludePath::default_catalog] and [IncludePath::default_schema].
A filepath
Type of included artifacts: model | test | stats | metadata | resource
Index method for this include path: scan | table | schema-table | catalog-schema-table
Defaults for files on this path
The default environment (can only be set on the level of the workspace)
The dialect of this environment. If not set, defaults to trino
Case normalization policy for names
The preprocessor for this environment. If not set, defaults to local
Defines a default catalog. If not set, defaults to the (catalog/workspace) name in an outer scope
Defines a default schema, If not set, defaults to the schema name in an outer scope, if not set, defaults to 'pub'
Defines the default materialization, if not set defaults to materialization in outer scope, if not set defaults to base-table
Defines table creation flags, defaults to if not set
The default utils library, if set overrides sdf_utils
The default test library, if set overrides sdf_test
The default materialization library, if set overrides sdf_materialization
The named lint rule set, uses defaults (from sdftarget/
The default index for this tables
The default index for this tables
The default index for this tables
The default severity for this tables tests and checks
CSV data has a header [only for external tables]
CSV data is separated by this delimiter [only for external tables]
Json or CSV data is compressed with this method [only for external tables]
Supported dialects in YAML files
Note: this [Dialect] type is only meant for serializing to and from .sdf.yml files. For internal use, you should almost always use the semantic dialect type ([types::Dialect]) instead.
Note specifically that the lack of a .to_string() method (i.e. [!Display]) on this type is intentional -- you must first convert this type to a [types::Dialect], by using either [Dialect::to_semantic_dialect] or [types::Dialect::from], before you can convert it to a string.
Compress table data using these methods
A filepath
Type of excluded artifacts
The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace
The chosen workspace environment (none means default)
The chosen workspace target (none means default)
The Git repo
the Git revision (choose only one of the fields: rev, branch, tag)
the Git branch (choose only one of the fields: rev, branch, tag)
the Git tag (choose only one of the fields: rev, branch, tag)
Which models, reports, tests, checks etc. to include from the dependency
Table providers manage tables in in catalogs (OLD_PROVIDERS)
Credential identifier for this provider
The cluster identifier for redshift server
The size of the batch when querying the provider
A list of (possibly remote) sources to read, matched in order, so write specific pattern before more general patterns
A list of (possibly remote) targets to build, matched in order, so write specific pattern before more general patterns, source patterns are excluded
A list of remote buckets to target
The remote output location of the integration
Sources defines the tables that are possibly remote
A source that can be read. Sources must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
Time travel qualifier expression (e.g. `AT (TIMESTAMP => {{ SOME_TIMESTAMP }})``)
Whether to preload the source
Renames sources when searching in the remote, the ith ${i} matches the ith * of the name, so to prepend all catalogs,schema,table with , use "${1}.${2}.${3}"
Sources define the tables that are possibly remote and that will be build
A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
A list of patterns. A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
Whether to preload the target
Renames targets, the ith ${i} matches the ith * of the name, so to prepend all catalogs,schema,table with , use "${1}.${2}.${3}"
The uri of the bucket
The region of the bucket
All file path should either be relative to the workspace, or absolute for an object store like AWS s3://
A filepath
Last modified of the file
Whether the dbt integration is enabled
Dbt Project Directory
The dbt target directory for the project
The directory where dbt profiles are stored
The dbt profile to use
The dbt target for the profile
Automatically Run Parse in-between commands (default: true)
Disable introspection (default: false)
Environments provide a way to override the fields of a workspace, i.e. if an environment has set field X, then the workspace field X will be overridden by the environment field X.
The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
A description of this workspace
The URL of the workspace source repository (defaults to 'none' if no repository is given)
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
Defaults for this workspace
Dependencies of the workspace to other workspaces or to cloud database providers
The integrations for this environment
A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as
Workspace defined by these set of files
Experimental: This project has jinja
Configuration for dbt integration
A tables is either defined by given data (also called external table) or defined via a query.
The name of the table (syntax: [[catalog.]schema].table)
Note: this field is typed as a [QualifiedName] for serialization. In almost all cases you should use the [Self::fqn()] method to get the fully qualified [TableName] instead of accessing this field directly.
A description of this table
The dialect of this table, defaults to trino
Case normalization policy for names specified in this table
The table-type of this table (new version)
The warehouse where this table is computed
Specify what kind of table or view this is
Whether the table exists in the remote DB (used for is_incremental macro)
Specify table ,location, defaults to none if not set
Defines the table creation options, defaults to none if not set
Options governing incremental table evaluation (only for incremental tables)
Options governing snapshot table evaluation (only for snapshot tables)
All tables that this table depends on (syntax: catalog.schema.table)
All tables that depend on this table (syntax: catalog.schema.table)
The columns of the schema: name, type, metadata
The partitioning format of the table
The default severity for this tables tests and checks
The schedule of the table [expressed as cron]
The first date of the table [expressed by prefixes of RFC 33]
An array of classifier references
Array of reclassify instructions for changing the attached classifier labels
Lineage, a tagged array of column references
Data is at this location
Store table in this format [only for external tables]
CSV data has a header [only for external tables]
CSV data is separated by this delimiter [only for external tables]
Json or CSV data is compressed with this method [only for external tables]
If this table is part of a cyclic dependency then cut the cycle here
Table is defined by these .sql and/or .sdf files
This table is either backed by a create table ddl or by a table definition in yml that is the table's complete schema
Metadata for this table
Incremental strategy; may be one of append, merge, or delete+insert
Expression used for identifying records in Merge and Delete+Insert strategies; May be a column name or an expression combining multiple columns. If left unspecified, Merge and Delete+Insert strategies behave the same as Append
List of column names to be updated as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified
List of column names to exclude from updating as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified
Method for reacting to changing schema in the source of the incremental table Possible values are fail, append, and sync. If left unspecified, the default behavior is to ignore the change and possibly error out if the schema change is incompatible. fail causes a failure whenever any deviation in the schema of the source is detected; append adds new columns but does not delete the columns removed from the source; sync adds new columns and deletess the columns removed from the source;
Warehouse to use in the incremental mode
Snapshot strategy; may be one of timestamp (default), or check
Expression used for identifying records that will be updated according to the snapshot strategiy; May be a column name or an expression combining multiple columns
Name of the timestamp column used to identify the last update time This option is only required for the timestamp snapshot strategy
Specification of which columns to check for change (may be a list of column names or all) This option is only required for the check snapshot strategy
Warehouse to use in the snapshot mode
A description of this column
The type of this column
An array of classifier references
Lineage, a tagged array of column references
Forward Lineage, the columns that this column is used to compute
Array of reclassify instructions for changing the attached classifier labels
An array of representative literals of this column [experimental!]
The output column is computed by copying these upstream columns
The output column is computed by transforming these upstream columns
These upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)
These functions were used to produce the output column
Target classifier
Expected source classifier
The constraint macro: must have the form lib.macro(args,..), where lib is any of the libs in scope, std is available by default
The severity of this constraint
A partition is a table column, used to describe to which partition this row belongs to
A description of the partition column
The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]
Store table data in these formats
A classifier defines the labels that can be attached to columns or a table.
The name of the classifier type
A description of this classifier type
Named classifier labels
Does the classifier propagate from scope to scope or is it a one scope marker
Classifier defined by these set of .sdf files
A classifier element is a scoped classifier label (e.g. the element PII belongs to the classifier scope data)
The name of the label, use "*" to allow arbitrary strings as labels
A description of this classifier element
A function block defines the signature for user defined
The function category
The dialect that provides this function
A description of this function
Arbitrary number of arguments of an common type out of a list of valid types
The arguments of this function
The arguments of this function
The results of this function (can be a tuple)
The constraints on generic type bounds
The generic type bounds
example - Example use of the function (tuple with input/output)
cross-link - link to existing documentation, for example: https://trino.io/docs/current/functions/datetime.html#date_trunc
Array of reclassify instructions for changing the attached classifier labels
Function defined by these set of .sdf files
Function can be called without parentheses, e.g. as if it were a constant, e.g. current_date
Arbitrary number of arguments of an common type out of a list of valid types
A function parameter
The name of the parameter
A description of this parameter
The datatype of this parameter
An array of classifier references
The required constant value of this parameter
The parameter may appear as identifier, without quote
A function parameter
The datatype of this parameter
A description of this parameter
An array of classifier references
The required constant value of this parameter
The parameter may appear as identifier, without quote
A function's volatility, which defines the functions eligibility for certain optimizations
The sql string corresponding to the input of this example
The output corresponding to running the input string
Indicates how a function's evaluation is implemented.
The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.
The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.
A configuration with section name and properties
The name of the configuration section
A description of this configuration section
The name of the credential (default = 'default')
A description of this credential
Credential defined by these set of .sdf files