Semantic Data Fabric (SDF) file
1.0Schema URL
Properties
A workspace definition
A profile definition
A catalog definition
A schema definition
A table definition
A policy definition
A classifier definition
A function definition
Definitions
A workspace is a collection of one or more catalogs, schemas, tables, and resources, called workspace members, that are managed together.
The SDF edition, should always be 1 (for now)
The name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
A description of this workspace
The URL of the workspace source repository [defaults to 'none' if no repository is given]
The default output object store location, e.g. 's3://bucket/key/' where key is optional
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
An array of paths to other workspaces, i.e. .sql and .yml files
The default dialect of this workspace. If not set, defaults to sdf dialect
An array of paths to directories and files, which will be copied to the SDF service on deployment
Defines a default catalog [If not set, defaults to the directory of the workspace]
Defines a default schema [If not set, defaults to 'pub']
Defines the default profile [if not set, defaults to 'dbg']
[Derived] Workspace is defined by this file
An array of named values for setting SQL variables from your enviromnent Ex. -dt: dt, used in SQL as @dt
All file path should either be relative to the workspace, or absolute for an object store like AWS s3://
A filepath
Last modified of the file
The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace
The Git repo
the Git revision
the Git branch
SQL queries can be parameterized via variables of type varchar. A variable definition binds the variable to the provided value.
The name of the variable
The value of this variable (using sql/yaml literals)
Profiles provide a way to alter the inclusion/exclusion of paths, resources and default dialect.
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
The default output object store location, e.g. 's3://bucket/dir/' where dir is optional
Catalogs are named collections of schemas in an SQL-environment, and are defined via their includes/excludes and dialect
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
The default dialect, defaults to sdf if not set
[Derived] Catalog is defined by this file
Schemas are named collections of tables in an SQL-environment, and are defined via their includes/excludes and dialect
An array of directories and filenames containing .sql and .sdf.yml files
An array of directories and filenames to be skipped when resolving includes
The dialect, defaults to sdf if not set
[Derived] Schema is defined by this file
A tables is either defined by given data (also called external table) or defined via a query.
An array of sql file names [Typically inferred]
The dialect of this table, defaults to sdf
The columns of the schema: name, type, metadata
The partitioning format of the table
The schedule of the table [expressed as cron]
The first date of the table [expressed by prefixes of RFC 33]
An array of classifier references
Lineage, a tagged array of column references
The materialization scheme of this table
Store data under this catalog.schema.table instead of the original name
Data is at this location
Store table in this format [only for external tables]
CSV data has a header [only for external tables]
CSV data is separated by this delimiter [only for external tables]
Json or CSV data is compressed with this method [only for external tables]
[Derived] Table is defined by this file
The name of the column
A description of this column
The type of this column
The type of this column
An array of classifier references
Lineage, a tagged array of column references
The output column is computed by copying these upstream columns
The output column is computed by transforming these upstream columns
These upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)
A partition is a table column, used to describe to which partition this row belongs to
The name of the partition column
A description of the partition column
The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]
Store table data in these formats
Compress table data using these methods
A policy is a an deny or allow expression that controls access to table.
The name of the policy
A description of this policy
Deny data access based on this policy Each policy may have only one deny or allow element
Allow data access based on this policy Each policy may have only one deny or allow element
[Derived] Policy is defined by this file
Deny access when these classifiers are present
except when these conditions hold
Allow access when these classifiers are present
except when these conditions hold
A classifier defines the labels that can be attached to columns or a table.
The name of the classifier type
A description of this classifier type
Named classifier elements
Named subsets of classifier elements
[Derived] Classifier is defined by this file
A classifier element is a scoped classifier label (e.g. the element PII belongs to the classifier scope data)
The name of the element
A description of this classifier element
A classifier subset is a scoped classifier label; it is defined by multiple elements or subsets
The name of the classifier subset
A description of the classifier subset
The subset is defined by unionizing all its values, elements, or subsets
A function block defines the signature for user defined
The name of the function [syntax: [[catalog.]schema].function]
The arguments of this function
The dialect that provides this function
Override an existing function
A description of this function
Arbitrary number of arguments of an common type out of a list of valid types
The function kind
The results of this function (can be a tuple)
The generic type bounds
volatility - The volatility of the function.
[Derived] Table is defined by this file
Arbitrary number of arguments of an common type out of a list of valid types
A function parameter
The name of the parameter
A description of this parameter
The datatype of this parameter
The nullability of this column
An array of classifier references
The required constant value of this parameter
A function's volatility, which defines the functions eligibility for certain optimizations