Type object
File match datacontract.yaml datacontract.yml *-datacontract.yaml *-datacontract.yml *.datacontract.yaml *.datacontract.yml datacontract-*.yaml datacontract-*.yml **/datacontract/*.yml **/datacontract/*.yaml **/datacontracts/*.yml **/datacontracts/*.yaml
Schema URL https://catalog.lintel.tools/schemas/schemastore/data-contract-specification/latest.json
Source https://raw.githubusercontent.com/datacontract/datacontract-specification/main/datacontract.schema.json

Validate with Lintel

npx @lintel/lintel check
Type: object

Properties

dataContractSpecification string required

Specifies the Data Contract Specification being used.

Values: "1.2.1" "1.2.0" "1.1.0" "0.9.3" "0.9.2" "0.9.1" "0.9.0"
id string required

Specifies the identifier of the data contract.

info object required

Metadata and life cycle information about the data contract.

6 nested properties
title string required

The title of the data contract.

version string required

The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version).

status string

The status of the data contract. Can be proposed, in development, active, retired.

Examples: "proposed", "in development", "active", "deprecated", "retired"
description string

A description of the data contract.

owner string

The owner or team responsible for managing the data contract and providing the data.

contact object

Contact information for the data contract.

3 nested properties
name string

The identifying name of the contact person/organization.

url string

The URL pointing to the contact information. This MUST be in the form of a URL.

format=uri
email string

The email address of the contact person/organization. This MUST be in the form of an email address.

format=email
servers Record<string, object>

Information about the servers.

terms object

The terms and conditions of the data contract.

5 nested properties
usage string

The usage describes the way the data is expected to be used. Can contain business and technical information.

limitations string

The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.

policies object[]

The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.

billing string

The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use.

noticePeriod string

The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months.

models Record<string, object>

Specifies the logical data model. Use the models name (e.g., the table name) as the key.

definitions Record<string, object>

Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.

servicelevels object

Specifies the service level agreements for the provided data, including availability, data retention policies, latency requirements, data freshness, update frequency, support availability, and backup policies.

7 nested properties
availability object

Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.

2 nested properties
description string

An optional string describing the availability service level.

percentage string

An optional string describing the guaranteed uptime in percent (e.g., 99.9%)

pattern=^\d+(\.\d+)?%$
retention object

Retention covers the period how long data will be available.

4 nested properties
description string

An optional string describing the retention service level.

period string

An optional period of time, how long data is available. Supported formats: Simple duration (e.g., 1 year, 30d) and ISO 8601 duration (e.g, P1Y).

unlimited boolean

An optional indicator that data is kept forever.

timestampField string

An optional reference to the field that contains the timestamp that the period refers to.

latency object

Latency refers to the maximum amount of time from the source to its destination.

4 nested properties
description string

An optional string describing the latency service level.

threshold string

An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g, PT24H).

sourceTimestampField string

An optional reference to the field that contains the timestamp when the data was provided at the source.

processedTimestampField string

An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.

freshness object

The maximum age of the youngest row in a table.

3 nested properties
description string

An optional string describing the freshness service level.

threshold string

An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).

timestampField string

An optional reference to the field that contains the timestamp that the threshold refers to.

frequency object

Frequency describes how often data is updated.

4 nested properties
description string

An optional string describing the frequency service level.

type string

The method of data processing.

Values: "batch" "micro-batching" "streaming" "manual"
interval string

Optional. Only for batch: How often the pipeline is triggered, e.g., daily.

cron string

Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., 0 0 * * *.

support object

Support describes the times when support will be available for contact.

3 nested properties
description string

An optional string describing the support service level.

time string

An optional string describing the times when support will be available for contact such as 24/7 or business hours only.

responseTime string

An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.

backup object

Backup specifies details about data backup procedures.

5 nested properties
description string

An optional string describing the backup service level.

interval string

An optional interval that defines how often data will be backed up, e.g., daily.

cron string

An optional cron expression when data will be backed up, e.g., 0 0 * * *.

recoveryTime string

An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).

recoveryPoint string

An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).

tags string[]

Tags to facilitate searching and filtering.

Definitions

FieldType string

The logical data type of the field.

BaseServer object
type string required

The type of the data product technology that implements the data contract.

Examples: "azure", "bigquery", "BigQuery", "clickhouse", "databricks", "dataframe", "glue", "kafka", "kinesis", "local", "oracle", "postgres", "pubsub", "redshift", "sftp", "sqlserver", "snowflake", "s3", "trino"
description string

An optional string describing the servers.

environment string

The environment in which the servers are running. Examples: prod, sit, stg.

roles object[]

An optional array of roles that are available and can be requested to access the server for role-based access control. E.g. separate roles for different regions or sensitive data.

BigQueryServer object
project string required

The GCP project name.

dataset string required

The GCP dataset name.

S3Server object
location string required

S3 URL, starting with s3://

Examples: "s3://datacontract-example-orders-latest/data/{model}/*.json"
format=uri
endpointUrl string

The server endpoint for S3-compatible servers.

Examples: "https://minio.example.com"
format=uri
format string

File format.

Values: "parquet" "delta" "json" "csv"
delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"
SftpServer object
location string required

SFTP URL, starting with sftp://

Examples: "sftp://123.123.12.123/{model}/*.json"
format=uripattern=^sftp://.*
format string

File format.

Values: "parquet" "delta" "json" "csv"
delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"
RedshiftServer object
account string required

An optional string describing the server.

database string required

An optional string describing the server.

schema string required

An optional string describing the server.

host string

An optional string describing the host name.

clusterIdentifier string

An optional string describing the cluster's identifier.

Examples: "redshift-prod-eu", "analytics-cluster"
port integer

An optional string describing the cluster's port.

Examples: 5439
endpoint string

An optional string describing the cluster's endpoint.

Examples: "analytics-cluster.example.eu-west-1.redshift.amazonaws.com:5439/analytics"
AzureServer object
location string required

Path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs. Recommended pattern is 'abfss://<container_name>/'

Examples: "abfss://my_container_name/path", "abfss://my_container_name/path/*.json", "az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet", "abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet"
format=uri
format string required

File format.

Values: "parquet" "delta" "json" "csv"
delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"
SqlserverServer object
host string required

The host to the database server

Examples: "localhost"
database string required

The name of the database.

Examples: "database"
schema string required

The name of the schema in the database.

Examples: "dbo"
port integer

The port to the database server.

Default: 1433
Examples: 1433
SnowflakeServer object
account string required

An optional string describing the server.

database string required

An optional string describing the server.

schema string required

An optional string describing the server.

DatabricksServer object
catalog string required

The name of the Hive or Unity catalog

schema string required

The schema name in the catalog

host string

The Databricks host

Examples: "dbc-abcdefgh-1234.cloud.databricks.com"
DataframeServer object
GlueServer object
account string required

The AWS Glue account

Examples: "1234-5678-9012"
database string required

The AWS Glue database name

Examples: "my_database"
location string

The AWS S3 path. Must be in the form of a URL.

Examples: "s3://datacontract-example-orders-latest/data/{model}"
format=uri
format string

The format of the files

Examples: "parquet", "csv", "json", "delta"
PostgresServer object
host string required

The host to the database server

Examples: "localhost"
port integer required

The port to the database server.

database string required

The name of the database.

Examples: "postgres"
schema string required

The name of the schema in the database.

Examples: "public"
OracleServer object
host string required

The host to the oracle server

Examples: "localhost"
port integer required

The port to the oracle server.

Examples: 1523
serviceName string required

The name of the service.

Examples: "service"
KafkaServer object

Kafka Server

host string required

The bootstrap server of the kafka cluster.

topic string required

The topic name.

format string

The format of the message. Examples: json, avro, protobuf.

Default: "json"
PubSubServer object
project string required

The GCP project name.

topic string required

The topic name.

KinesisDataStreamsServer object

Kinesis Data Streams Server

stream string required

The name of the Kinesis data stream.

region string

AWS region.

Examples: "eu-west-1"
format string

The format of the record

Examples: "json", "avro", "protobuf"
TrinoServer object
host string required

The Trino host URL.

Examples: "localhost"
port integer required

The Trino port.

catalog string required

The name of the catalog.

Examples: "hive"
schema string required

The name of the schema in the database.

Examples: "my_schema"
ClickhouseServer object
host string required

The host to the database server

Examples: "localhost"
port integer required

The port to the database server.

database string required

The name of the database.

Examples: "postgres"
LocalServer object
path string required

The relative or absolute path to the data file(s).

Examples: "./folder/data.parquet", "./folder/*.parquet"
format string required

The format of the file(s)

Examples: "json", "parquet", "delta", "csv"
Quality
Lineage object
inputFields object[] required
transformationDescription string

a string representation of the transformation applied

transformationType string

IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)