Open Data Contract Standard (ODCS)
Open Data Contract Standard contract file, from the Bitol project at The Linux Foundation
| Type | object |
|---|---|
| File match |
*.odcs.yaml
*.odcs.yml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/open-data-contract-standard-odcs/latest.json |
| Source | https://raw.githubusercontent.com/bitol-io/open-data-contract-standard/main/schema/odcs-json-schema-latest.json |
Validate with Lintel
npx @lintel/lintel check
An open data contract specification to establish agreement between data producers and consumers.
Properties
Current version of the data contract.
The kind of file this is. Valid value is DataContract.
Version of the standard used to build data contract. Default value is v3.1.0.
A unique identifier used to reduce the risk of dataset name collisions, such as a UUID.
Current status of the dataset.
Name of the data contract.
Indicates the property the data is primarily associated with. Value is case insensitive.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
List of servers where the datasets reside.
The name of the data product.
High level description of the dataset.
5 nested properties
Intended usage of the dataset.
Purpose of the dataset.
Limitations of the dataset.
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
A list of key/value pairs for custom properties.
Name of the logical data domain.
A list of elements within the schema to be cataloged.
Top level for support channels.
4 nested properties
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Subscription price per unit of measure in priceUnit.
Currency of the subscription price in price.priceAmount.
The unit of measure for calculating cost. Examples megabyte, gigabyte.
A list of roles that will provide user access to the dataset.
DEPRECATED SINCE 3.1. WILL BE REMOVED IN ODCS 4.0. Element (using the element path notation) to do the checks on.
A list of key/value pairs for SLA specific properties. There is no limit on the type of properties (more details to come).
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
A list of key/value pairs for custom properties.
Timestamp in UTC of when the data contract was created.
Definitions
Shorthand notation using name fields (table_name.column_name)
Fully qualified notation using id fields (section/id/properties/id), optionally prefixed with external file reference
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Data source details of where data is physically stored.
Identifier of the server.
Type of the server.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Description of the server.
Environment of the server.
List of roles that have access to the server.
A list of key/value pairs for custom properties.
Compatibility wrapper for relationship definitions.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Name of the element.
The physical element data type in the data source.
Description of the element.
The business name of the element.
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
A list of key/value pairs for custom properties.
The logical element data type.
Physical name.
Granular level of the data in the object.
A list of properties for the object.
A list of relationships to other properties. Each relationship must have 'from', 'to' and optionally 'type' field.
Data quality rules with all the relevant information for rule setup and execution.
Boolean value specifying whether the element is primary or not. Default is false.
If element is a primary key, the position of the primary key element. Starts from 1. Example of account_id, name being primary key columns, account_id has primaryKeyPosition 1 and name primaryKeyPosition 2. Default to -1.
The logical element data type.
Additional optional metadata to describe the logical type.
The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT.
Physical name.
Indicates if the element may contain Null values; possible values are true and false. Default is false.
Indicates if the element contains unique values; possible values are true and false. Default is false.
Indicates if the element is partitioned; possible values are true and false.
If element is used for partitioning, the position of the partition element. Starts from 1. Example of country, year being partition columns, country has partitionKeyPosition 1 and year partitionKeyPosition 2. Default to -1.
Can be anything, like confidential, restricted, and public to more advanced categorization. Some companies like PayPal, use data classification indicating the class of data in the element; expected values are 1, 2, 3, 4, or 5.
The element name within the dataset that contains the encrypted element value. For example, unencrypted element email_address might have an encryptedName of email_address_encrypt.
List of objects in the data source used in the transformation.
Logic used in the element transformation.
Describes the transform logic in very simple terms.
List of sample element values.
True or false indicator; If element is considered a critical data element (CDE) then true else false.
A list of relationships to other properties. When defined at property level, the 'from' field is implicit and should not be specified.
Data quality rules with all the relevant information for rule setup and execution.
Boolean value specifying whether the element is primary or not. Default is false.
If element is a primary key, the position of the primary key element. Starts from 1. Example of account_id, name being primary key columns, account_id has primaryKeyPosition 1 and name primaryKeyPosition 2. Default to -1.
The logical element data type.
Additional optional metadata to describe the logical type.
The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT.
Physical name.
Indicates if the element may contain Null values; possible values are true and false. Default is false.
Indicates if the element contains unique values; possible values are true and false. Default is false.
Indicates if the element is partitioned; possible values are true and false.
If element is used for partitioning, the position of the partition element. Starts from 1. Example of country, year being partition columns, country has partitionKeyPosition 1 and year partitionKeyPosition 2. Default to -1.
Can be anything, like confidential, restricted, and public to more advanced categorization. Some companies like PayPal, use data classification indicating the class of data in the element; expected values are 1, 2, 3, 4, or 5.
The element name within the dataset that contains the encrypted element value. For example, unencrypted element email_address might have an encryptedName of email_address_encrypt.
List of objects in the data source used in the transformation.
Logic used in the element transformation.
Describes the transform logic in very simple terms.
List of sample element values.
True or false indicator; If element is considered a critical data element (CDE) then true else false.
A list of relationships to other properties. When defined at property level, the 'from' field is implicit and should not be specified.
Data quality rules with all the relevant information for rule setup and execution.
Boolean value specifying whether the element is primary or not. Default is false.
If element is a primary key, the position of the primary key element. Starts from 1. Example of account_id, name being primary key columns, account_id has primaryKeyPosition 1 and name primaryKeyPosition 2. Default to -1.
The logical element data type.
Additional optional metadata to describe the logical type.
The physical element data type in the data source. For example, VARCHAR(2), DOUBLE, INT.
Physical name.
Indicates if the element may contain Null values; possible values are true and false. Default is false.
Indicates if the element contains unique values; possible values are true and false. Default is false.
Indicates if the element is partitioned; possible values are true and false.
If element is used for partitioning, the position of the partition element. Starts from 1. Example of country, year being partition columns, country has partitionKeyPosition 1 and year partitionKeyPosition 2. Default to -1.
Can be anything, like confidential, restricted, and public to more advanced categorization. Some companies like PayPal, use data classification indicating the class of data in the element; expected values are 1, 2, 3, 4, or 5.
The element name within the dataset that contains the encrypted element value. For example, unencrypted element email_address might have an encryptedName of email_address_encrypt.
List of objects in the data source used in the transformation.
Logic used in the element transformation.
Describes the transform logic in very simple terms.
List of sample element values.
True or false indicator; If element is considered a critical data element (CDE) then true else false.
A list of relationships to other properties. When defined at property level, the 'from' field is implicit and should not be specified.
Data quality rules with all the relevant information for rule setup and execution.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
"finance""sensitive""employee_record"
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
Consequences of the rule failure.
Additional properties required for rule execution.
Describe the quality check to be completed.
The key performance indicator (KPI) or dimension for data quality.
Name of the data quality check.
Rule execution schedule details.
The name or type of scheduler used to start the data quality check.
The severance of the quality rule.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
The type of quality check. 'text' is human-readable text that describes the quality of the data. 'library' is a set of maintained predefined quality attributes such as row count or unique. 'sql' is an individual SQL query that returns a value that can be compared. 'custom' is quality attributes that are vendor-specific, such as Soda or Great Expectations.
Unit the rule is using, popular values are rows or percent, but any value is allowed.
Data quality rules with all the relevant information for rule setup and execution.
Common comparison operators for data quality checks.
Define a data quality check based on the predefined metrics as per ODCS.
Use metric instead
Additional arguments for the metric, if needed.
Query string that adheres to the dialect of the provided server.
Name of the engine which executes the data quality checks.
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
Top level for support channels.
Channel name or identifier.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Access URL using normal URL scheme (https, mailto, etc.).
Description of the channel, free text.
Name of the tool, value can be email, slack, teams, discord, ticket, googlechat, or other.
Scope can be: interactive, announcements, issues, notifications.
Some tools uses invitation URL for requesting or subscribing. Follows the URL scheme.
A list of key/value pairs for custom properties.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Subscription price per unit of measure in priceUnit.
Currency of the subscription price in price.priceAmount.
The unit of measure for calculating cost. Examples megabyte, gigabyte.
Team member information.
The user's username or email.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
The user's name.
The user's description.
The user's job role; Examples might be owner, data steward. There is no limit on the role.
The date when the user joined the team.
The date when the user ceased to be part of the team.
The username of the user who replaced the previous user.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
Custom properties block.
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
Team information.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Team name.
Team description.
List of members.
A list of tags that may be assigned to the elements (object or property); the tags keyword may appear at any level. Tags may be used to better categorize an element. For example, finance, sensitive, employee_record.
Custom properties block.
List of links to sources that provide more details on the dataset; examples would be a link to an external definition, a training video, a git repo, data catalog, or another tool. Authoritative definitions follow the same structure in the standard.
Name of the IAM role that provides access to the dataset.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Description of the IAM role and its permissions.
The type of access provided by the IAM role.
The name(s) of the first-level approver(s) of the role.
The name(s) of the second-level approver(s) of the role.
A list of key/value pairs for custom properties.
Specific property in SLA, check the periodic table. May requires units (more details to come).
Agreement value. The label will change based on the property itself.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
d, day, days for days; y, yr, years for years, etc. Units use the ISO standard.
Element(s) to check on. Multiple elements should be extremely rare and, if so, separated by commas.
Describes the importance of the SLA from the list of: regulatory, analytics, or operational.
Description of the SLA for humans.
Name of the scheduler, can be cron or any tool your organization support.
Configuration information for the scheduling tool, for cron a possible value is 0 20 * * *.
A list of key/value pairs for custom properties.
The name of the key. Names should be in camel caseāthe same as if they were permanent properties in the contract.
Stable technical identifier for references. Must be unique within its containing array. Cannot contain special characters ('-', '_' allowed).
Description of the custom property.
Base definition for relationships between properties, typically for foreign key constraints.
The type of relationship. Defaults to 'foreignKey'.
Source property or properties.
Target property or properties to reference.
A list of key/value pairs for custom properties.
Relationship definition at schema level, requiring both 'from' and 'to' fields with matching types.
Relationship definition at property level, where 'from' is implicitly the current property.
The url to the API.
Amazon Athena automatically stores query results and metadata information for each query that runs in a query result location that you can specify in Amazon S3.
Identify the schema in the data source in which your tables exist.
Identify the name of the Data Source, also referred to as a Catalog.
The region your AWS account uses.
Fully qualified path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs.
File format.
Only for format = json. How multiple json documents are delimited within one file
The GCP project name.
The GCP dataset name.
The host of the ClickHouse server.
The port to the ClickHouse server.
The name of the database.
The name of the Hive or Unity catalog
The schema name in the catalog
The Databricks host
The host of the Denodo server.
The port of the Denodo server.
The name of the database.
The host of the Dremio server.
The port of the Dremio server.
The name of the schema.
Path to duckdb database file.
The name of the schema.
The AWS Glue account
The AWS Glue database name
The AWS S3 path. Must be in the form of a URL.
The format of the files
The host of the Google Cloud Sql server.
The port of the Google Cloud Sql server.
The name of the database.
The name of the schema.
The host of the IBM DB2 server.
The port of the IBM DB2 server.
The name of the database.
The name of the schema.
The host to the Hive server.
The name of the Hive database.
The port to the Hive server. Defaults to 10000.
The host to the Impala server.
The name of the Impala database.
The port to the Impala server. Defaults to 21050.
The host to the Informix server.
The name of the database.
The port to the Informix server. Defaults to 9088.
Hostname or IP address of the Zen server.
Database name to connect to on the Zen server.
Zen server SQL connections port. Defaults to 1583.
Account used by the server.
Name of the catalog.
Name of the database.
Name of the dataset.
Delimiter.
Server endpoint.
File format.
Host name or IP address.
A URL to a location.
Relative or absolute path to the data file(s).
Port to the server. No default value is assumed for custom servers.
Project name.
Cloud region.
Region name.
Name of the schema.
Name of the service.
Staging directory.
Name of the cluster or warehouse.
Name of the data stream.
Kafka Server
The bootstrap server of the kafka cluster.
The format of the messages.
Kinesis Data Streams Server
AWS region.
The format of the record
The relative or absolute path to the data file(s).
The format of the file(s)
The host of the MySql server.
The port of the MySql server.
The name of the database.
The host to the oracle server
The port to the oracle server.
The name of the service.
The host to the Postgres server
The port to the Postgres server.
The name of the database.
The name of the schema in the database.
The host to the Presto server
The name of the catalog.
The name of the schema.
The GCP project name.
The name of the database.
The name of the schema.
An optional string describing the server.
AWS region of Redshift server.
The account used by the server.
S3 URL, starting with s3://
The server endpoint for S3-compatible servers.
File format.
Only for format = json. How multiple json documents are delimited within one file
SFTP URL, starting with sftp://
File format.
Only for format = json. How multiple json documents are delimited within one file
The Snowflake account used by the server.
The name of the database.
The name of the schema.
The host to the Snowflake server
The port to the Snowflake server.
The name of the cluster of resources that is a Snowflake virtual warehouse.
The host to the database server
The name of the database.
The name of the schema in the database.
The port to the database server.
The host of the Synapse server.
The port of the Synapse server.
The name of the database.
The Trino host URL.
The Trino port.
The name of the catalog.
The name of the schema in the database.
The host of the Vertica server.
The port of the Vertica server.
The name of the database.
The name of the schema.