Semantic Data Fabric (SDF) file
1.1Schema URL
Properties
A workspace definition
A profile definition
A table definition
A classifier definition
A function definition
A plugin definition
A workspace definition
A workspace definition
Definitions
A workspace is a collection of one or more catalogs, schemas, tables, and resources, called workspace members, that are managed together.
The SDF edition, should always be 1 (for now)
The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
A description of this workspace
The URL of the workspace source repository (defaults to 'none' if no repository is given)
The default output object store location, e.g. 's3://bucket/key/' where key is optional
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
An array of paths to other workspaces, i.e. .sql and .yml files Todo why do we skip this serialization?
Defines a default catalog (If not set, defaults to the workspace name)
Defines a default schema (If not set, defaults to 'pub')
Defines the default profile (if not set, defaults to 'dbg')
Workspace defined by these set of files
A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, an in Jinja as {{ dt }}
Experimental: This project has jinja, sql_vars, and sql_macros
Experimental: This is a dbt project..
All file path should either be relative to the workspace, or absolute for an object store like AWS s3:// Note that an [IncludePath] specifies a catalog and schema scope for unqualified names (in effect for both creating and querying tables). See [IncludePath::default_catalog] and [IncludePath::default_schema].
A filepath
Type of included artifacts: model | test | stats | metadata | resource
Defines a default catalog for unqualified names. If not set, defaults to the [Workspace] catalog.
Defines a default schema for unqualified names. If not set, defaults to the [Workspace] schema.
The dialect of the included files. If not set, defaults to the [Workspace] dialect.
The compute platform for building the included files. If not set, defaults to the [Workspace] compute platform.
Index method for this include path: scan | table | schema-table | catalog-schema-table
Synchronization schema for this include path: always | on-pull | on-push | never
Supported dialects
Supported compute platforms
A filepath
Type of excluded artifacts
The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace
The chosen workspace profile (none means default)
The Git repo
the Git revision (choose only one of the fields: rev, branch, tag)
the Git branch (choose only one of the fields: rev, branch, tag)
the Git tag (choose only one of the fields: rev, branch, tag)
All file path should either be relative to the workspace, or absolute for an object store like AWS s3://
A filepath
Last modified of the file
Supported dialects
Profiles provide a way to override the fields of a workspace, i.e. if a profile has set field X, then the workspace field X will be overridden by the profile field X.
The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
A description of this workspace
The URL of the workspace source repository (defaults to 'none' if no repository is given)
The default output object store location, e.g. 's3://bucket/key/' where key is optional
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
An array of paths to other workspaces, i.e. .sql and .yml files
Defines a default catalog (If not set, defaults to the workspace name)
Defines a default schema (If not set, defaults to 'pub')
A map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, an in Jinja as {{ dt }}
Defines the default profile (if not set, defaults to 'dbg')
Workspace defined by these set of files
Experimental: This project has jinja
Experimental: This is a dbt project.. The default severity for this tables tests and checks
A tables is either defined by given data (also called external table) or defined via a query.
The dialect of this table, defaults to presto
The compute platform for evaluating the query populating this table, defaults to local
The table-type of this table
ALl table dependencies (syntax: catalog.schema.table)
The columns of the schema: name, type, metadata
The partitioning format of the table
The schedule of the table [expressed as cron]
The first date of the table [expressed by prefixes of RFC 33]
An array of classifier references
Array of reclassify instructions for changing the attached classifier labels
Lineage, a tagged array of column references
Data is at this location
Store table in this format [only for external tables]
CSV data has a header [only for external tables]
CSV data is separated by this delimiter [only for external tables]
Json or CSV data is compressed with this method [only for external tables]
Table is defined by these .sql and/or .sdf files
The name of the column
A description of this column
The type of this column
An array of classifier references
Lineage, a tagged array of column references
Array of reclassify instructions for changing the attached classifier labels
An array of representative literals of this column [experimental!]
The output column is computed by copying these upstream columns
The output column is computed by transforming these upstream columns
These upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)
These functions were used to produce the output column
Target classifier
Expected source classifier
The type of the constraint
The name of the constraint
A partition is a table column, used to describe to which partition this row belongs to
The name of the partition column
A description of the partition column
The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]
Store table data in these formats
Compress table data using these methods
A classifier defines the labels that can be attached to columns or a table.
The name of the classifier type
A description of this classifier type
Named classifier labels
Does the classifier propagate from scope to scope or is it a one scope marker
Classifier defined by these set of .sdf files
A classifier element is a scoped classifier label (e.g. the element PII belongs to the classifier scope data)
The name of the label, use "*" to allow arbitrary strings as labels
A description of this classifier element
A function block defines the signature for user defined
The name of the function [syntax: [[catalog.]schema].function]
The generic type bounds
The dialect that provides this function
A description of this function
Arbitrary number of arguments of an common type out of a list of valid types
The arguments of this function
The arguments of this function
The results of this function (can be a tuple)
The generic type bounds
example - Example use of the function (tuple with input/output)
cross-link - link to existing documentation, for example: https://prestodb.io/docs/current/functions/datetime.html#truncation-function
Array of reclassify instructions for changing the attached classifier labels
Function defined by these set of .sdf files
Function can be called without parentheses, e.g. as if it were a constant, e.g. current_date
Arbitrary number of arguments of an common type out of a list of valid types
A function parameter
The name of the parameter
A description of this parameter
The datatype of this parameter
An array of classifier references
The required constant value of this parameter
The parameter may appear as identifier, without quote
A function parameter
The name of the parameter
The datatype of this parameter
A description of this parameter
An array of classifier references
The required constant value of this parameter
The parameter may appear as identifier, without quote
A function's volatility, which defines the functions eligibility for certain optimizations
The sql string corresponding to the input of this example
The output corresponding to running the input string
A function block defines the signature for user defined
The name of the plugin [e.g.: pyspark]
An array of directories and filenames containing files processed by this plugin
Image URI of the plugin [e.g.: docker.io/sdf/pyspark:latest]
Path to the dockerfile of the plugin [e.g.: dockerfile]
Whether to keep the plugin container alive after execution
A filepath
Type of Plugin Path (Default: queries)
Last modified of the file
Table providers manage tables in in catalogs
A list of sources, backed by the table provider, source can use globs, e.g. catalog.schema.*
The name of the table provider
The snowflake warehouse (defaults: the warehouse that was given at sdf Auth)
The cluster identifier for redshift server
A configuration with section name and properties
The name of the configuration section
A description of this configuration section