Data Contract Specification
Data Contract Specification file
| Type | object |
|---|---|
| File match |
datacontract.yaml
datacontract.yml
*-datacontract.yaml
*-datacontract.yml
*.datacontract.yaml
*.datacontract.yml
datacontract-*.yaml
datacontract-*.yml
**/datacontract/*.yml
**/datacontract/*.yaml
**/datacontracts/*.yml
**/datacontracts/*.yaml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/data-contract-specification/latest.json |
| Source | https://raw.githubusercontent.com/datacontract/datacontract-specification/main/datacontract.schema.json |
Validate with Lintel
npx @lintel/lintel check
Properties
Specifies the Data Contract Specification being used.
Specifies the identifier of the data contract.
Metadata and life cycle information about the data contract.
6 nested properties
The title of the data contract.
The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version).
The status of the data contract. Can be proposed, in development, active, retired.
A description of the data contract.
The owner or team responsible for managing the data contract and providing the data.
Contact information for the data contract.
3 nested properties
The identifying name of the contact person/organization.
The URL pointing to the contact information. This MUST be in the form of a URL.
The email address of the contact person/organization. This MUST be in the form of an email address.
Information about the servers.
The terms and conditions of the data contract.
5 nested properties
The usage describes the way the data is expected to be used. Can contain business and technical information.
The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.
The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for.
The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use.
The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months.
Specifies the logical data model. Use the models name (e.g., the table name) as the key.
Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.
Specifies the service level agreements for the provided data, including availability, data retention policies, latency requirements, data freshness, update frequency, support availability, and backup policies.
7 nested properties
Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.
2 nested properties
An optional string describing the availability service level.
An optional string describing the guaranteed uptime in percent (e.g., 99.9%)
Retention covers the period how long data will be available.
4 nested properties
An optional string describing the retention service level.
An optional period of time, how long data is available. Supported formats: Simple duration (e.g., 1 year, 30d) and ISO 8601 duration (e.g, P1Y).
An optional indicator that data is kept forever.
An optional reference to the field that contains the timestamp that the period refers to.
Latency refers to the maximum amount of time from the source to its destination.
4 nested properties
An optional string describing the latency service level.
An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g, PT24H).
An optional reference to the field that contains the timestamp when the data was provided at the source.
An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.
The maximum age of the youngest row in a table.
3 nested properties
An optional string describing the freshness service level.
An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., 24 hours, 5s) and ISO 8601 duration (e.g., PT24H).
An optional reference to the field that contains the timestamp that the threshold refers to.
Frequency describes how often data is updated.
4 nested properties
An optional string describing the frequency service level.
The method of data processing.
Optional. Only for batch: How often the pipeline is triggered, e.g., daily.
Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., 0 0 * * *.
Support describes the times when support will be available for contact.
3 nested properties
An optional string describing the support service level.
An optional string describing the times when support will be available for contact such as 24/7 or business hours only.
An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.
Backup specifies details about data backup procedures.
5 nested properties
An optional string describing the backup service level.
An optional interval that defines how often data will be backed up, e.g., daily.
An optional cron expression when data will be backed up, e.g., 0 0 * * *.
An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).
An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).
Links to external resources.
Tags to facilitate searching and filtering.
Definitions
The logical data type of the field.
The type of the data product technology that implements the data contract.
An optional string describing the servers.
The environment in which the servers are running. Examples: prod, sit, stg.
An optional array of roles that are available and can be requested to access the server for role-based access control. E.g. separate roles for different regions or sensitive data.
The GCP project name.
The GCP dataset name.
S3 URL, starting with s3://
The server endpoint for S3-compatible servers.
File format.
Only for format = json. How multiple json documents are delimited within one file
SFTP URL, starting with sftp://
File format.
Only for format = json. How multiple json documents are delimited within one file
An optional string describing the server.
An optional string describing the server.
An optional string describing the server.
An optional string describing the host name.
An optional string describing the cluster's identifier.
An optional string describing the cluster's port.
An optional string describing the cluster's endpoint.
Path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs. Recommended pattern is 'abfss://<container_name>/
File format.
Only for format = json. How multiple json documents are delimited within one file
The host to the database server
The name of the database.
The name of the schema in the database.
The port to the database server.
An optional string describing the server.
An optional string describing the server.
An optional string describing the server.
The name of the Hive or Unity catalog
The schema name in the catalog
The Databricks host
The AWS Glue account
The AWS Glue database name
The AWS S3 path. Must be in the form of a URL.
The format of the files
The host to the database server
The port to the database server.
The name of the database.
The name of the schema in the database.
The host to the oracle server
The port to the oracle server.
The name of the service.
Kafka Server
The bootstrap server of the kafka cluster.
The topic name.
The format of the message. Examples: json, avro, protobuf.
The GCP project name.
The topic name.
Kinesis Data Streams Server
The name of the Kinesis data stream.
AWS region.
The format of the record
The Trino host URL.
The Trino port.
The name of the catalog.
The name of the schema in the database.
The host to the database server
The port to the database server.
The name of the database.
The relative or absolute path to the data file(s).
The format of the file(s)
a string representation of the transformation applied
IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)