Type object
Schema URL https://catalog.lintel.tools/schemas/schemastore/eidolon-resource/_shared/latest--NLTKTextSplitter.json
Parent schema eidolon-resource
Type: object

Properties

implementation const: "NLTKTextSplitter" required
Constant: "NLTKTextSplitter"
chunk_size integer

Maximum size of chunks to return

Default: 4000
chunk_overlap integer

Overlap in characters between chunks

Default: 200
keep_separator boolean

Whether to keep the separator in the chunks

Default: false
strip_whitespace boolean

If True, strips whitespace from the start and end of every document

Default: true
separator string

Separator to split on

Default: " "
language string

Language to use for tokenization

Default: "english"