KSY (SchemaStore) JSON Schema

Type	object
File match	`*.ksy`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/ksy/latest.json
Source	https://raw.githubusercontent.com/kaitai-io/ksy_schema/master/ksy_schema.json

Validate with Lintel

npx @lintel/lintel check

Type: object

the schema for ksy files

Properties

meta object

14 nested properties

id string | boolean

title string

brief name of the format

Examples: "Windows PE executable"

application string | string[]

applications that use this format and are typically associated with it

file-extension string | string[]

file extensions typically used for this format, without the leading dot and in lowercase letters

should be sorted from most popular to least popular

Examples: "exe", ["jpg","jpeg"]

xref object

8 nested properties

forensicswiki MediaWikiPageName | MediaWikiPageName[]

article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools

full link name could be generated as <https://forensics.wiki/> + this value + /

iso IsoIdentifier | IsoIdentifier[]

ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).

ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004

justsolve MediaWikiPageName | MediaWikiPageName[]

article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats

full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value

loc LocIdentifier | LocIdentifier[]

identifier in Digital Formats database of US Library of Congress

value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier

mime MimeType | MimeType[]

MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload

there is a central registry of media types managed by IANA

value must specify full MIME type (both parts), e.g. image/png

pronom PronomIdentifier | PronomIdentifier[]

format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes

rfc RfcIdentifier | RfcIdentifier[]

reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)

RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs

value should be just raw RFC number, without any prefixes, e.g. 1234

wikidata WikidataIdentifier | WikidataIdentifier[]

item identifier at Wikidata, a global knowledge base

value typically follows Qxxx pattern, where xxx is a number generated by Wikidata, e.g. Q535473

tags string[]

list of tags (categories/keywords) that can be assigned to the format

used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located

should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats

should be written in lower_snake_case and sorted in alphabetical order

license string

license under which the KSY file is released

required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)

must be a valid SPDX expression (however, a single license identifier from this list is usually enough)

to clarify, this is not a license of the original format description, but a license of the particular KSY implementation - if you're writing one, you can choose any open source license you want, regardless of what resources you use (as long as you only reproduce the idea and you don't copy long excerpts); we recommend CC0-1.0 or MIT

generated files from a KSY spec retain the same license as the original KSY

Examples: "CC0-1.0", "MIT"

ks-version string | number

minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)

only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)

the value must be sometimes enclosed in quotes to ensure correct interpretation, for example ks-version: '0.10' (without the quotes it is parsed as a float in YAML and gets interpreted as 0.1, which will be rejected)

ks-debug boolean

advise the Kaitai Struct Compiler (KSC) to use debug mode

Default: false

ks-opaque-types boolean

advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for

Default: false

imports string | string[]

list of relative or absolute paths to another .ksy files to import (without the .ksy extension)

the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file

encoding enum

canonical names of character encodings supported by Kaitai Struct

in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them

Values: "ASCII" "UTF-8" "UTF-16BE" "UTF-16LE" "UTF-32BE" "UTF-32LE" "ISO-8859-1" "ISO-8859-2" "ISO-8859-3" "ISO-8859-4" "ISO-8859-5" "ISO-8859-6" "ISO-8859-7" "ISO-8859-8" "ISO-8859-9" "ISO-8859-10" "ISO-8859-11" "ISO-8859-13" "ISO-8859-14" "ISO-8859-15" "ISO-8859-16" "windows-1250" "windows-1251" "windows-1252" "windows-1253" "windows-1254" "windows-1255" "windows-1256" "windows-1257" "windows-1258" "IBM437" "IBM866" "Shift_JIS" "Big5" "EUC-KR"

endian enum | object

default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)

applies to the current type and its subtypes

this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)

bit-endian enum

default parsing direction (bit endianness) of bit-sized integers (built-in bX types)

big-endian (be) order is default, but it is recommended to specify it explicitly

can only have the literal value le or be (runtime switching as with the endian key is not supported)

for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers

Default: "be"

Values: "le" "be"

doc string

used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.

doc-ref string | string[]

used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).

Contains:

URL as text,
arbitrary string, or
URL as text + space + arbitrary string

to-string string

expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes

it will be used to override the standard method for converting an object to a string called toString() (or similar) in most target languages, __str__() in Python and to_s in Ruby; in Rust, it is the Display trait

Examples: "f"{file_name} ({file_size} bytes)"", "file_name + " (" + file_size.to_s + " bytes)""

params ParamSpec[]

seq Attribute[]

instances object

types object

enums object

Definitions

Doc string

used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.

DocRef string | string[]

used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).

Contains:

URL as text,
arbitrary string, or
URL as text + space + arbitrary string

ToString string

expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes

Examples:

"f"{file_name} ({file_size} bytes)""
"file_name + " (" + file_size.to_s + " bytes)""

MediaWikiPageName string

IsoIdentifier string

LocIdentifier string

MimeType string

PronomIdentifier string

RfcIdentifier integer | string

WikidataIdentifier string

MetaSpec object

id string | boolean

title string

brief name of the format

Examples: "Windows PE executable"

application string | string[]

applications that use this format and are typically associated with it

file-extension string | string[]

file extensions typically used for this format, without the leading dot and in lowercase letters

should be sorted from most popular to least popular

Examples: "exe", ["jpg","jpeg"]

xref object

8 nested properties

forensicswiki MediaWikiPageName | MediaWikiPageName[]

article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools

full link name could be generated as <https://forensics.wiki/> + this value + /

iso IsoIdentifier | IsoIdentifier[]

ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).

ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004

justsolve MediaWikiPageName | MediaWikiPageName[]

article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats

full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value

loc LocIdentifier | LocIdentifier[]

identifier in Digital Formats database of US Library of Congress

value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier

mime MimeType | MimeType[]

MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload

there is a central registry of media types managed by IANA

value must specify full MIME type (both parts), e.g. image/png

pronom PronomIdentifier | PronomIdentifier[]

format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes

rfc RfcIdentifier | RfcIdentifier[]

reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)

RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs

value should be just raw RFC number, without any prefixes, e.g. 1234

wikidata WikidataIdentifier | WikidataIdentifier[]

item identifier at Wikidata, a global knowledge base

value typically follows Qxxx pattern, where xxx is a number generated by Wikidata, e.g. Q535473

tags string[]

list of tags (categories/keywords) that can be assigned to the format

used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located

should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats

should be written in lower_snake_case and sorted in alphabetical order

license string

license under which the KSY file is released

required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)

must be a valid SPDX expression (however, a single license identifier from this list is usually enough)

generated files from a KSY spec retain the same license as the original KSY

Examples: "CC0-1.0", "MIT"

ks-version string | number

minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)

only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)

ks-debug boolean

advise the Kaitai Struct Compiler (KSC) to use debug mode

Default: false

ks-opaque-types boolean

advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for

Default: false

imports string | string[]

list of relative or absolute paths to another .ksy files to import (without the .ksy extension)

the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file

encoding enum

canonical names of character encodings supported by Kaitai Struct

in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them

endian enum | object

default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)

applies to the current type and its subtypes

this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)

bit-endian enum

default parsing direction (bit endianness) of bit-sized integers (built-in bX types)

big-endian (be) order is default, but it is recommended to specify it explicitly

can only have the literal value le or be (runtime switching as with the endian key is not supported)

for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers

Default: "be"

Values: "le" "be"

TypeRef string | enum

Attribute object

id string

contains a string used to identify one attribute among others

pattern=^[a-z][a-z0-9_]*$

doc string

used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.

doc-ref string | string[]

used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).

Contains:

URL as text,
arbitrary string, or
URL as text + space + arbitrary string

contents string | StringOrInteger[]

specify fixed contents that the parser should encounter at this point. If the content of the stream doesn't match the given bytes, an error is thrown and it's meaningless to continue parsing

type TypeRef | object

defines data type for an attribute

the type can also be user-defined in the types key

one can reference a nested user-defined type by specifying a relative path to it from the current type, with a double colon as a path delimiter (e.g. foo::bar::my_type)

repeat enum

designates repeated attribute in a structure

attribute read as array/list/sequence

Values: "expr" "eos" "until"

repeat-expr string | integer

repeat-until string | boolean

specifies a condition to be checked after each parsed item, repeating while the expression is false

one can use _ in the expression, which is a special local variable that references the last read element

if string | boolean

marks the attribute as optional (attribute is parsed only if the condition specified evaluates to true)

size string | integer

size-eos boolean

if true, reads all the bytes till the end of the stream

default is false

Default: false

process string

specifies an algorithm to be applied to the underlying byte buffer of the attribute before parsing

can be used only if the size is known (either size, size-eos: true or terminator are specified), see Applying process without a size in the User Guide

apply a bitwise XOR (written as ^ in most C-like languages) to every byte of the buffer using the provided key

key is required, and can be either

a single byte value — will be XORed with every byte of the input stream
- make sure that the key is in range 0-255, otherwise you may get unexpected results
a byte array — first byte of the input will be XORed with the first byte of the key, second byte of the input with the second byte of the key, etc.
- when the end of the key is reached, it starts again from the first byte

the output length remains the same as the input length

| rol(n), ror(n) |

apply a bitwise rotation (also known as a circular shift) by n bits to every byte of the buffer

rol = left circular shift, ror = right circular shift

n is required, and should be in range 0-7 for consistent results (to be safe, use shift_amount % 8 as the n parameter, if the value of shift_amount itself may not fall into that range)

| zlib |

apply a zlib decompression to the input buffer, expecting it to be a full-fledged zlib stream, i.e. having a regular 2-byte zlib header.

typical zlib header values:

78 01 — low compression
78 9C — default compression
78 DA — best compression

{my_custom_processor}(a, b, ...)

({my_custom_processor} is an arbitrary name matching [a-z][a-z0-9_.]*)

use a custom processing routine, which you implement in imperative code in the target language

the generated code will use the class name {my_custom_processor} using the naming convention of the target language (in most languages MyCustomProcessor, but e.g. my_custom_processor_t in C++: check the generated code)

the processing class must define the method public byte[] decode(byte[] src) and should implement the interface CustomDecoder (available in C++, C# and Java)

you can pass any parameters (a, b, ...) to your {my_custom_processor} class constructor (omit the () brackets for parameter-less invocation)

one can reference a class in a different namespace/package like com.example.my_rle(5, 3)

see Custom processing routines in the User Guide for more info

pattern=^zlib|(xor|rol|ror)$.*$$

enum string

name of existing enum field data type becomes given enum

pattern=^([a-z][a-z0-9_]*::)*[a-z][a-z0-9_]*$

encoding enum

canonical names of character encodings supported by Kaitai Struct

in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them

pad-right integer

specify a byte which is the string or byte array padded with after the end up to the total size

can be used only with size or size-eos: true (when the size is fixed)

when terminator:

isn't specified, then the pad-right controls where the string ends (basically acts like a terminator)
is specified, padding comes after the terminator, not before. The value is terminated immediately after the terminator occurs, so the pad-right has no effect on parsing and is only relevant for serialization

min=0max=255

terminator integer

string or byte array reading will stop when it encounters this byte

cannot be used with type: strz (which already implies terminator: 0 - null-terminated string)

min=0max=255

consume boolean

specify if terminator byte should be "consumed" when reading

if true: the stream pointer will point to the byte after the terminator byte

if false: the stream pointer will point to the terminator byte itself

default is true

Default: true

include boolean

specifies if terminator byte should be considered part of the string read and thus be appended to it

default is false

Default: false

eos-error boolean

allows the compiler to ignore the lack of a terminator if eos-error is disabled, string reading will stop at either:

terminator being encountered
end of stream is reached

default is true

Default: true

pos string | integer

io string

specifies an IO stream from which a value should be parsed

value

overrides any reading & parsing. Instead, just calculates function specified in value and returns the result as this instance. Has many purposes

Attributes Attribute[]

StringOrInteger string | integer

TypeSpec object

meta object

14 nested properties

id string | boolean

title string

brief name of the format

Examples: "Windows PE executable"

application string | string[]

applications that use this format and are typically associated with it

file-extension string | string[]

file extensions typically used for this format, without the leading dot and in lowercase letters

should be sorted from most popular to least popular

Examples: "exe", ["jpg","jpeg"]

xref object

8 nested properties

forensicswiki MediaWikiPageName | MediaWikiPageName[]

article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools

full link name could be generated as <https://forensics.wiki/> + this value + /

iso IsoIdentifier | IsoIdentifier[]

ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).

ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004

justsolve MediaWikiPageName | MediaWikiPageName[]

article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats

full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value

loc LocIdentifier | LocIdentifier[]

identifier in Digital Formats database of US Library of Congress

value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier

mime MimeType | MimeType[]

MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload

there is a central registry of media types managed by IANA

value must specify full MIME type (both parts), e.g. image/png

pronom PronomIdentifier | PronomIdentifier[]

format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes

rfc RfcIdentifier | RfcIdentifier[]

reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)

RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs

value should be just raw RFC number, without any prefixes, e.g. 1234

wikidata WikidataIdentifier | WikidataIdentifier[]

item identifier at Wikidata, a global knowledge base

value typically follows Qxxx pattern, where xxx is a number generated by Wikidata, e.g. Q535473

tags string[]

list of tags (categories/keywords) that can be assigned to the format

used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located

should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats

should be written in lower_snake_case and sorted in alphabetical order

license string

license under which the KSY file is released

required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)

must be a valid SPDX expression (however, a single license identifier from this list is usually enough)

generated files from a KSY spec retain the same license as the original KSY

Examples: "CC0-1.0", "MIT"

ks-version string | number

minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)

only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)

ks-debug boolean

advise the Kaitai Struct Compiler (KSC) to use debug mode

Default: false

ks-opaque-types boolean

advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for

Default: false

imports string | string[]

list of relative or absolute paths to another .ksy files to import (without the .ksy extension)

the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file

encoding enum

canonical names of character encodings supported by Kaitai Struct

in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them

endian enum | object

default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)

applies to the current type and its subtypes

this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)

bit-endian enum

default parsing direction (bit endianness) of bit-sized integers (built-in bX types)

big-endian (be) order is default, but it is recommended to specify it explicitly

can only have the literal value le or be (runtime switching as with the endian key is not supported)

for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers

Default: "be"

Values: "le" "be"

doc string

used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.

doc-ref string | string[]

used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).

Contains:

URL as text,
arbitrary string, or
URL as text + space + arbitrary string

to-string string

expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes

Examples: "f"{file_name} ({file_size} bytes)"", "file_name + " (" + file_size.to_s + " bytes)""

params ParamSpec[]

seq Attribute[]

instances object

types object

enums object

TypesSpec object

ParamSpec object

id required

All of: string string, variant

type string

specifies "pure" type of the parameter, without any serialization details (like endianness, sizes, encodings)

one can specify arrays by appending [] after the type identifier (e.g. type: u2[], type: 'foo::bar[]', type: struct[] etc.)

doc string

used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.

doc-ref string | string[]

used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).

Contains:

URL as text,
arbitrary string, or
URL as text + space + arbitrary string

enum string

path to an enum type (defined in the enums map), which will become the type of the parameter

only integer-based enums are supported, so type must be an integer type (type: uX, type: sX or type: bX) for this property to work

you can use enum with type: b1 as well: b1 means a 1-bit integer (0 or 1) when used with enum (not a boolean)

one can reference an enum type of a subtype by specifying a relative path to it from the current type, with a double colon as a path delimiter (e.g. foo::bar::my_enum)

pattern=^([a-z][a-z0-9_]*::)*[a-z][a-z0-9_]*$

ParamsSpec ParamSpec[]

InstancesSpec object

EnumValueSpec Identifier | object

EnumSpec object

EnumsSpec object

Identifier string | boolean

AnyScalar string | number | integer | boolean | null

CharacterEncoding enum

canonical names of character encodings supported by Kaitai Struct

in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them