Data Contract Specification (SchemaStore) JSON Schema

Type	object
File match	`datacontract.yaml` `datacontract.yml` `-datacontract.yaml` `-datacontract.yml` `.datacontract.yaml` `.datacontract.yml` `datacontract-.yaml` `datacontract-.yml` `*/datacontract/.yml` `*/datacontract/.yaml` `*/datacontracts/.yml` `*/datacontracts/.yaml`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/data-contract-specification/latest.json
Source	https://raw.githubusercontent.com/datacontract/datacontract-specification/main/datacontract.schema.json

Validate with Lintel

npx @lintel/lintel check

Type: object

Properties

dataContractSpecification string required

Specifies the Data Contract Specification being used.

Values: "1.2.1" "1.2.0" "1.1.0" "0.9.3" "0.9.2" "0.9.1" "0.9.0"

id string required

Specifies the identifier of the data contract.

info object required

Metadata and life cycle information about the data contract.

6 nested properties

title string required

The title of the data contract.

version string required

The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version).

status string

The status of the data contract. Can be proposed, in development, active, retired.

Examples: "proposed", "in development", "active", "deprecated", "retired"

description string

A description of the data contract.

owner string

The owner or team responsible for managing the data contract and providing the data.

contact object

Contact information for the data contract.

3 nested properties

name string

The identifying name of the contact person/organization.

url string

The URL pointing to the contact information. This MUST be in the form of a URL.

format=uri

email string

The email address of the contact person/organization. This MUST be in the form of an email address.

format=email

servers Record<string, object>

Information about the servers.

terms object

The terms and conditions of the data contract.

5 nested properties

usage string

The usage describes the way the data is expected to be used. Can contain business and technical information.

limitations string

The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.

policies object[]

The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.

billing string

The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use.

noticePeriod string

The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months.

models Record<string, object>

Specifies the logical data model. Use the models name (e.g., the table name) as the key.

definitions Record<string, object>

Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.

servicelevels object

Specifies the service level agreements for the provided data, including availability, data retention policies, latency requirements, data freshness, update frequency, support availability, and backup policies.

7 nested properties

availability object

Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.

2 nested properties

description string

An optional string describing the availability service level.

percentage string

An optional string describing the guaranteed uptime in percent (e.g., 99.9%)

pattern=^\d+(\.\d+)?%$

retention object

Retention covers the period how long data will be available.

4 nested properties

description string

An optional string describing the retention service level.

period string

An optional period of time, how long data is available. Supported formats: Simple duration (e.g., 1 year, 30d) and ISO 8601 duration (e.g, P1Y).

unlimited boolean

An optional indicator that data is kept forever.

timestampField string

An optional reference to the field that contains the timestamp that the period refers to.

latency object

Latency refers to the maximum amount of time from the source to its destination.

4 nested properties

description string

An optional string describing the latency service level.

threshold string

An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g, PT24H).

sourceTimestampField string

An optional reference to the field that contains the timestamp when the data was provided at the source.

processedTimestampField string

An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.

freshness object

The maximum age of the youngest row in a table.

3 nested properties

description string

An optional string describing the freshness service level.

threshold string

An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).

timestampField string

An optional reference to the field that contains the timestamp that the threshold refers to.

frequency object

Frequency describes how often data is updated.

4 nested properties

description string

An optional string describing the frequency service level.

type string

The method of data processing.

Values: "batch" "micro-batching" "streaming" "manual"

interval string

Optional. Only for batch: How often the pipeline is triggered, e.g., daily.

cron string

Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., 0 0 * * *.

support object

Support describes the times when support will be available for contact.

3 nested properties

description string

An optional string describing the support service level.

time string

An optional string describing the times when support will be available for contact such as 24/7 or business hours only.

responseTime string

An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.

backup object

Backup specifies details about data backup procedures.

5 nested properties

description string

An optional string describing the backup service level.

interval string

An optional interval that defines how often data will be backed up, e.g., daily.

cron string

An optional cron expression when data will be backed up, e.g., 0 0 * * *.

recoveryTime string

An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).

recoveryPoint string

An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).

links Record<string, string>

Links to external resources.

tags string[]

Tags to facilitate searching and filtering.

Definitions

FieldType string

The logical data type of the field.

BaseServer object

type string required

The type of the data product technology that implements the data contract.

Examples: "azure", "bigquery", "BigQuery", "clickhouse", "databricks", "dataframe", "glue", "kafka", "kinesis", "local", "oracle", "postgres", "pubsub", "redshift", "sftp", "sqlserver", "snowflake", "s3", "trino"

description string

An optional string describing the servers.

environment string

The environment in which the servers are running. Examples: prod, sit, stg.

roles object[]

An optional array of roles that are available and can be requested to access the server for role-based access control. E.g. separate roles for different regions or sensitive data.

BigQueryServer object

project string required

The GCP project name.

dataset string required

The GCP dataset name.

S3Server object

location string required

S3 URL, starting with s3://

Examples: "s3://datacontract-example-orders-latest/data/{model}/*.json"

format=uri

endpointUrl string

The server endpoint for S3-compatible servers.

Examples: "https://minio.example.com"

format=uri

format string

File format.

Values: "parquet" "delta" "json" "csv"

delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"

SftpServer object

location string required

SFTP URL, starting with sftp://

Examples: "sftp://123.123.12.123/{model}/*.json"

format=uripattern=^sftp://.*

format string

File format.

Values: "parquet" "delta" "json" "csv"

delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"

RedshiftServer object

account string required

An optional string describing the server.

database string required

An optional string describing the server.

schema string required

An optional string describing the server.

host string

An optional string describing the host name.

clusterIdentifier string

An optional string describing the cluster's identifier.

Examples: "redshift-prod-eu", "analytics-cluster"

port integer

An optional string describing the cluster's port.

Examples: 5439

endpoint string

An optional string describing the cluster's endpoint.

Examples: "analytics-cluster.example.eu-west-1.redshift.amazonaws.com:5439/analytics"

AzureServer object

location string required

Path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs. Recommended pattern is 'abfss://<container_name>/'

Examples: "abfss://my_container_name/path", "abfss://my_container_name/path/*.json", "az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet", "abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet"

format=uri

format string required

File format.

Values: "parquet" "delta" "json" "csv"

delimiter string

Only for format = json. How multiple json documents are delimited within one file

Values: "new_line" "array"

SqlserverServer object

host string required

The host to the database server

Examples: "localhost"

database string required

The name of the database.

Examples: "database"

schema string required

The name of the schema in the database.

Examples: "dbo"

port integer

The port to the database server.

Default: 1433

Examples: 1433

SnowflakeServer object

account string required

An optional string describing the server.

database string required

An optional string describing the server.

schema string required

An optional string describing the server.

DatabricksServer object

catalog string required

The name of the Hive or Unity catalog

schema string required

The schema name in the catalog

host string

The Databricks host

Examples: "dbc-abcdefgh-1234.cloud.databricks.com"

DataframeServer object

GlueServer object

account string required

The AWS Glue account

Examples: "1234-5678-9012"

database string required

The AWS Glue database name

Examples: "my_database"

location string

The AWS S3 path. Must be in the form of a URL.

Examples: "s3://datacontract-example-orders-latest/data/{model}"

format=uri

format string

The format of the files

Examples: "parquet", "csv", "json", "delta"

PostgresServer object

host string required

The host to the database server

Examples: "localhost"

port integer required

The port to the database server.

database string required

The name of the database.

Examples: "postgres"

schema string required

The name of the schema in the database.

Examples: "public"

OracleServer object

host string required

The host to the oracle server

Examples: "localhost"

port integer required

The port to the oracle server.

Examples: 1523

serviceName string required

The name of the service.

Examples: "service"

KafkaServer object

Kafka Server

host string required

The bootstrap server of the kafka cluster.

topic string required

The topic name.

format string

The format of the message. Examples: json, avro, protobuf.

Default: "json"

PubSubServer object

project string required

The GCP project name.

topic string required

The topic name.

KinesisDataStreamsServer object

Kinesis Data Streams Server

stream string required

The name of the Kinesis data stream.

region string

AWS region.

Examples: "eu-west-1"

format string

The format of the record

Examples: "json", "avro", "protobuf"

TrinoServer object

host string required

The Trino host URL.

Examples: "localhost"

port integer required

The Trino port.

catalog string required

The name of the catalog.

Examples: "hive"

schema string required

The name of the schema in the database.

Examples: "my_schema"

ClickhouseServer object

host string required

The host to the database server

Examples: "localhost"

port integer required

The port to the database server.

database string required

The name of the database.

Examples: "postgres"

LocalServer object

path string required

The relative or absolute path to the data file(s).

Examples: "./folder/data.parquet", "./folder/*.parquet"

format string required

The format of the file(s)

Examples: "json", "parquet", "delta", "csv"

Quality

Lineage object

inputFields object[] required

transformationDescription string

a string representation of the transformation applied

transformationType string

IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)