Type object
Schema URL https://catalog.lintel.tools/schemas/schemastore/eidolon-resource/_shared/latest--TokenTextSplitter.json
Parent schema eidolon-resource
Type: object

Properties

implementation const: "TokenTextSplitter" required
Constant: "TokenTextSplitter"
chunk_size integer

Maximum size of chunks to return

Default: 4000
chunk_overlap integer

Overlap in characters between chunks

Default: 200
keep_separator boolean

Whether to keep the separator in the chunks

Default: false
strip_whitespace boolean

If True, strips whitespace from the start and end of every document

Default: true
encoding_name string

Encoding name

Default: "gpt2"
model string | null

Model name

Default: null
allowed_special string | string[]

Allowed special tokens

Default:
[]
disallowed_special string | string[]

Disallowed special tokens

Default: "all"