KSY
Kaitai Struct format description file
| Type | object |
|---|---|
| File match |
*.ksy
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/ksy/latest.json |
| Source | https://raw.githubusercontent.com/kaitai-io/ksy_schema/master/ksy_schema.json |
Validate with Lintel
npx @lintel/lintel check
the schema for ksy files
Properties
14 nested properties
brief name of the format
applications that use this format and are typically associated with it
file extensions typically used for this format, without the leading dot and in lowercase letters
should be sorted from most popular to least popular
8 nested properties
article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools
full link name could be generated as <https://forensics.wiki/> + this value + /
ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).
ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004
article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats
full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value
identifier in Digital Formats database of US Library of Congress
value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier
MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload
there is a central registry of media types managed by IANA
value must specify full MIME type (both parts), e.g. image/png
format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes
reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)
RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs
value should be just raw RFC number, without any prefixes, e.g. 1234
list of tags (categories/keywords) that can be assigned to the format
used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located
should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats
should be written in lower_snake_case and sorted in alphabetical order
license under which the KSY file is released
required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)
must be a valid SPDX expression (however, a single license identifier from this list is usually enough)
to clarify, this is not a license of the original format description, but a license of the particular KSY implementation - if you're writing one, you can choose any open source license you want, regardless of what resources you use (as long as you only reproduce the idea and you don't copy long excerpts); we recommend CC0-1.0 or MIT
generated files from a KSY spec retain the same license as the original KSY
minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)
only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)
the value must be sometimes enclosed in quotes to ensure correct interpretation, for example ks-version: '0.10' (without the quotes it is parsed as a float in YAML and gets interpreted as 0.1, which will be rejected)
advise the Kaitai Struct Compiler (KSC) to use debug mode
advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for
list of relative or absolute paths to another .ksy files to import (without the .ksy extension)
the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file
canonical names of character encodings supported by Kaitai Struct
in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them
default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)
applies to the current type and its subtypes
this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)
default parsing direction (bit endianness) of bit-sized integers (built-in bX types)
big-endian (be) order is default, but it is recommended to specify it explicitly
can only have the literal value le or be (runtime switching as with the endian key is not supported)
for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers
used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.
used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).
Contains:
- URL as text,
- arbitrary string, or
- URL as text + space + arbitrary string
expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes
it will be used to override the standard method for converting an object to a string called toString() (or similar) in most target languages, __str__() in Python and to_s in Ruby; in Rust, it is the Display trait
Definitions
used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.
used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).
Contains:
- URL as text,
- arbitrary string, or
- URL as text + space + arbitrary string
expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes
it will be used to override the standard method for converting an object to a string called toString() (or similar) in most target languages, __str__() in Python and to_s in Ruby; in Rust, it is the Display trait
"f"{file_name} ({file_size} bytes)"""file_name + " (" + file_size.to_s + " bytes)""
brief name of the format
applications that use this format and are typically associated with it
file extensions typically used for this format, without the leading dot and in lowercase letters
should be sorted from most popular to least popular
8 nested properties
article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools
full link name could be generated as <https://forensics.wiki/> + this value + /
ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).
ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004
article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats
full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value
identifier in Digital Formats database of US Library of Congress
value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier
MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload
there is a central registry of media types managed by IANA
value must specify full MIME type (both parts), e.g. image/png
format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes
reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)
RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs
value should be just raw RFC number, without any prefixes, e.g. 1234
list of tags (categories/keywords) that can be assigned to the format
used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located
should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats
should be written in lower_snake_case and sorted in alphabetical order
license under which the KSY file is released
required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)
must be a valid SPDX expression (however, a single license identifier from this list is usually enough)
to clarify, this is not a license of the original format description, but a license of the particular KSY implementation - if you're writing one, you can choose any open source license you want, regardless of what resources you use (as long as you only reproduce the idea and you don't copy long excerpts); we recommend CC0-1.0 or MIT
generated files from a KSY spec retain the same license as the original KSY
minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)
only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)
the value must be sometimes enclosed in quotes to ensure correct interpretation, for example ks-version: '0.10' (without the quotes it is parsed as a float in YAML and gets interpreted as 0.1, which will be rejected)
advise the Kaitai Struct Compiler (KSC) to use debug mode
advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for
list of relative or absolute paths to another .ksy files to import (without the .ksy extension)
the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file
canonical names of character encodings supported by Kaitai Struct
in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them
default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)
applies to the current type and its subtypes
this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)
default parsing direction (bit endianness) of bit-sized integers (built-in bX types)
big-endian (be) order is default, but it is recommended to specify it explicitly
can only have the literal value le or be (runtime switching as with the endian key is not supported)
for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers
contains a string used to identify one attribute among others
used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.
used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).
Contains:
- URL as text,
- arbitrary string, or
- URL as text + space + arbitrary string
specify fixed contents that the parser should encounter at this point. If the content of the stream doesn't match the given bytes, an error is thrown and it's meaningless to continue parsing
defines data type for an attribute
the type can also be user-defined in the types key
one can reference a nested user-defined type by specifying a relative path to it from the current type, with a double colon as a path delimiter (e.g. foo::bar::my_type)
designates repeated attribute in a structure
| Value | Description
|-
| eos | repeat until the end of the current stream
| expr | repeat as many times as specified in repeat-expr
| until | repeat until the expression in repeat-until becomes true
attribute read as array/list/sequence
specifies a condition to be checked after each parsed item, repeating while the expression is false
one can use _ in the expression, which is a special local variable that references the last read element
marks the attribute as optional (attribute is parsed only if the condition specified evaluates to true)
if true, reads all the bytes till the end of the stream
default is false
specifies an algorithm to be applied to the underlying byte buffer of the attribute before parsing
can be used only if the size is known (either size, size-eos: true or terminator are specified), see Applying process without a size in the User Guide
| Value | Description
|-
| xor(key) |
apply a bitwise XOR (written as ^ in most C-like languages) to every byte of the buffer using the provided key
key is required, and can be either
- a single byte value — will be XORed with every byte of the input stream
- make sure that the
keyis in range 0-255, otherwise you may get unexpected results
- make sure that the
- a byte array — first byte of the input will be XORed with the first byte of the key, second byte of the input with the second byte of the key, etc.
- when the end of the key is reached, it starts again from the first byte
the output length remains the same as the input length
|rol(n), ror(n) | apply a bitwise rotation (also known as a circular shift) by n bits to every byte of the buffer
rol = left circular shift, ror = right circular shift
n is required, and should be in range 0-7 for consistent results (to be safe, use shift_amount % 8 as the n parameter, if the value of shift_amount itself may not fall into that range)
zlib | apply a zlib decompression to the input buffer, expecting it to be a full-fledged zlib stream, i.e. having a regular 2-byte zlib header.
typical zlib header values:
78 01— low compression78 9C— default compression78 DA— best compression
{my_custom_processor}(a, b, ...)
({my_custom_processor} is an arbitrary name matching [a-z][a-z0-9_.]*)
use a custom processing routine, which you implement in imperative code in the target language
the generated code will use the class name {my_custom_processor} using the naming convention of the target language (in most languages MyCustomProcessor, but e.g. my_custom_processor_t in C++: check the generated code)
the processing class must define the method public byte[] decode(byte[] src) and should implement the interface CustomDecoder (available in C++, C# and Java)
you can pass any parameters (a, b, ...) to your {my_custom_processor} class constructor (omit the () brackets for parameter-less invocation)
one can reference a class in a different namespace/package like com.example.my_rle(5, 3)
see Custom processing routines in the User Guide for more info
name of existing enum field data type becomes given enum
canonical names of character encodings supported by Kaitai Struct
in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them
specify a byte which is the string or byte array padded with after the end up to the total size
can be used only with size or size-eos: true (when the size is fixed)
when terminator:
- isn't specified, then the
pad-rightcontrols where the string ends (basically acts like a terminator) - is specified, padding comes after the terminator, not before. The value is terminated immediately after the terminator occurs, so the
pad-righthas no effect on parsing and is only relevant for serialization
string or byte array reading will stop when it encounters this byte
cannot be used with type: strz (which already implies terminator: 0 - null-terminated string)
specify if terminator byte should be "consumed" when reading
if true: the stream pointer will point to the byte after the terminator byte
if false: the stream pointer will point to the terminator byte itself
default is true
specifies if terminator byte should be considered part of the string read and thus be appended to it
default is false
allows the compiler to ignore the lack of a terminator if eos-error is disabled, string reading will stop at either:
-
terminator being encountered
-
end of stream is reached
default is true
specifies an IO stream from which a value should be parsed
overrides any reading & parsing. Instead, just calculates function specified in value and returns the result as this instance. Has many purposes
14 nested properties
brief name of the format
applications that use this format and are typically associated with it
file extensions typically used for this format, without the leading dot and in lowercase letters
should be sorted from most popular to least popular
8 nested properties
article name at Forensics Wiki, which is a CC-BY-SA-licensed wiki with information on digital forensics, file formats and tools
full link name could be generated as <https://forensics.wiki/> + this value + /
ISO/IEC standard number, reference to a standard accepted and published by ISO (International Organization for Standardization).
ISO standards typically have clear designations like "ISO/IEC 15948:2004", so value should be citing everything except for "ISO/IEC", i.e. 15948:2004
article name at "Just Solve the File Format Problem" wiki, a wiki that collects information on many file formats
full link name could be generated as <http://fileformats.archiveteam.org/wiki/> + this value
identifier in Digital Formats database of US Library of Congress
value typically looks like fddXXXXXX, where XXXXXX is a 6-digit identifier
MIME type (IANA media type), a string typically used in various Internet protocols to specify format of binary payload
there is a central registry of media types managed by IANA
value must specify full MIME type (both parts), e.g. image/png
format identifier in PRONOM Technical Registry of UK National Archives, which is a massive file formats database that catalogues many file formats for digital preservation purposes
reference to RFC, "Request for Comments" documents maintained by ISOC (Internet Society)
RFCs are typically treated as global, Internet-wide standards, and, for example, many networking / interoperability protocols are specified in RFCs
value should be just raw RFC number, without any prefixes, e.g. 1234
list of tags (categories/keywords) that can be assigned to the format
used in the format gallery to display formats at https://formats.kaitai.io/ also in categories other than the main one, which corresponds to the directory where the .ksy file is located
should match the directory names in https://github.com/kaitai-io/kaitai_struct_formats
should be written in lower_snake_case and sorted in alphabetical order
license under which the KSY file is released
required for all KSY specifications in the format gallery (otherwise optional, but highly recommended)
must be a valid SPDX expression (however, a single license identifier from this list is usually enough)
to clarify, this is not a license of the original format description, but a license of the particular KSY implementation - if you're writing one, you can choose any open source license you want, regardless of what resources you use (as long as you only reproduce the idea and you don't copy long excerpts); we recommend CC0-1.0 or MIT
generated files from a KSY spec retain the same license as the original KSY
minimum Kaitai Struct compiler (KSC) version required to compile this .ksy file (older versions will refuse to compile and inform the user that they need at least the specified version)
only versions 0.6 or higher are accepted (KSC 0.6 was the first to support ks-version, so there is no point in entering any lower version)
the value must be sometimes enclosed in quotes to ensure correct interpretation, for example ks-version: '0.10' (without the quotes it is parsed as a float in YAML and gets interpreted as 0.1, which will be rejected)
advise the Kaitai Struct Compiler (KSC) to use debug mode
advise the Kaitai Struct Compiler (KSC) to ignore missing types in the .ksy file, and assume that these types are already provided externally by the environment the classes are generated for
list of relative or absolute paths to another .ksy files to import (without the .ksy extension)
the top-level type of the imported file will be accessible in the current spec under the name specified in the top-level /meta/id of the imported file
canonical names of character encodings supported by Kaitai Struct
in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them
default endianness (byte order) of built-in multibyte numeric types, i.e. integers (sX and uX, where X is 2, 4 or 8) and floating-point numbers (fX, where X is 4 or 8)
applies to the current type and its subtypes
this key is required if you use any sX, uX or fX types (other than s1 and u1) without an explicit le or be suffix (as in u2be or f4le)
default parsing direction (bit endianness) of bit-sized integers (built-in bX types)
big-endian (be) order is default, but it is recommended to specify it explicitly
can only have the literal value le or be (runtime switching as with the endian key is not supported)
for more information, see https://doc.kaitai.io/user_guide.html#_bit_sized_integers
used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.
used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).
Contains:
- URL as text,
- arbitrary string, or
- URL as text + space + arbitrary string
expression that provides a human-readable string representation of an object of this user-defined type for debugging purposes
it will be used to override the standard method for converting an object to a string called toString() (or similar) in most target languages, __str__() in Python and to_s in Ruby; in Rust, it is the Display trait
specifies "pure" type of the parameter, without any serialization details (like endianness, sizes, encodings)
| Value | Description
|-
| u1, u2, u4, u8 | unsigned integer
| s1, s2, s4, s8 | signed integer
| bX | bit-sized integer (if X != 1)
| f4, f8 | floating point number
| type key missing
or bytes | byte array
| str | string
| bool (or b1) | boolean
| struct | arbitrary KaitaiStruct-compatible user type
| io | KaitaiStream-compatible IO stream
| any | allow any type (if target language supports that)
| other identifier | user-defined type, without parameters
a nested type can be referenced with double colon (e.g. type: 'foo::bar')
one can specify arrays by appending [] after the type identifier (e.g. type: u2[], type: 'foo::bar[]', type: struct[] etc.)
used to give a more detailed description of a user-defined type. In most languages, it will be used as a docstring compatible with tools like Javadoc, Doxygen, JSDoc, etc.
used to provide reference to original documentation (if the ksy file is actually an implementation of some documented format).
Contains:
- URL as text,
- arbitrary string, or
- URL as text + space + arbitrary string
path to an enum type (defined in the enums map), which will become the type of the parameter
only integer-based enums are supported, so type must be an integer type (type: uX, type: sX or type: bX) for this property to work
you can use enum with type: b1 as well: b1 means a 1-bit integer (0 or 1) when used with enum (not a boolean)
one can reference an enum type of a subtype by specifying a relative path to it from the current type, with a double colon as a path delimiter (e.g. foo::bar::my_enum)
canonical names of character encodings supported by Kaitai Struct
in addition to these canonical names, the compiler (since version 0.11) also recognizes their popular aliases, but issues a warning for them