Type object
Schema URL https://catalog.lintel.tools/schemas/schemastore/eidolon-resource/_shared/latest--SpacyTextSplitter.json
Parent schema eidolon-resource
Type: object

Properties

implementation const: "SpacyTextSplitter" required
Constant: "SpacyTextSplitter"
chunk_size integer

Maximum size of chunks to return

Default: 4000
chunk_overlap integer

Overlap in characters between chunks

Default: 200
keep_separator boolean

Whether to keep the separator in the chunks

Default: false
strip_whitespace boolean

If True, strips whitespace from the start and end of every document

Default: true
separator string

Separator to split on

Default: " "
pipeline string

Spacy pipeline to use

Default: "en_core_web_sm"
max_length integer

Maximum length of characters to process

Default: 1000000