Type object
Schema URL https://catalog.lintel.tools/schemas/schemastore/haystack-pipeline/_shared/latest--haystack-pipeline-1.9.0.schema.json
Parent schema haystack-pipeline
Type: object

Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions

Properties

version string required

Version of the Haystack Pipeline file.

Constant: "1.9.0"
components DeepsetCloudDocumentStoreComponent | ElasticsearchDocumentStoreComponent | FAISSDocumentStoreComponent | GraphDBKnowledgeGraphComponent | InMemoryDocumentStoreComponent | InMemoryKnowledgeGraphComponent | Milvus2DocumentStoreComponent | OpenDistroElasticsearchDocumentStoreComponent | OpenSearchDocumentStoreComponent | PineconeDocumentStoreComponent | SQLDocumentStoreComponent | WeaviateDocumentStoreComponent | AnswerToSpeechComponent | AzureConverterComponent | BM25RetrieverComponent | CrawlerComponent | DensePassageRetrieverComponent | Docs2AnswersComponent | DocumentToSpeechComponent | DocxToTextConverterComponent | ElasticsearchFilterOnlyRetrieverComponent | ElasticsearchRetrieverComponent | EmbeddingRetrieverComponent | EntityExtractorComponent | EvalAnswersComponent | EvalDocumentsComponent | FARMReaderComponent | FileTypeClassifierComponent | FilterRetrieverComponent | ImageToTextConverterComponent | JoinAnswersComponent | JoinDocumentsComponent | MarkdownConverterComponent | MultihopEmbeddingRetrieverComponent | OpenAIAnswerGeneratorComponent | PDFToTextConverterComponent | PDFToTextOCRConverterComponent | ParsrConverterComponent | PreProcessorComponent | PseudoLabelGeneratorComponent | QuestionGeneratorComponent | RAGeneratorComponent | RCIReaderComponent | RouteDocumentsComponent | SentenceTransformersRankerComponent | Seq2SeqGeneratorComponent | SklearnQueryClassifierComponent | TableReaderComponent | TableTextRetrieverComponent | Text2SparqlRetrieverComponent | TextConverterComponent | TfidfRetrieverComponent | TikaConverterComponent | TransformersDocumentClassifierComponent | TransformersQueryClassifierComponent | TransformersReaderComponent | TransformersSummarizerComponent | TransformersTranslatorComponent[] required

Component nodes and their configurations, to later be used in the pipelines section. Define here all the building blocks for the pipelines.

pipelines object[] required

Multiple pipelines can be defined using the components from the same YAML file.

extras string

To be specified only if contains special pipelines (for example, if this is a Ray pipeline)

Values: "ray"

One of

1. object object
pipelines
2. object object
extras enum required
Values: "ray"

Definitions

DeepsetCloudDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "DeepsetCloudDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

9 nested properties
api_key string
workspace string
Default: "default"
index string | null
duplicate_documents string
Default: "overwrite"
api_endpoint string | null
similarity string
Default: "dot_product"
return_embedding boolean
Default: false
label_index string
Default: "default"
embedding_dim integer
Default: 768
ElasticsearchDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "ElasticsearchDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

33 nested properties
host string | string[]
Default: "localhost"
port integer | integer[]
Default: 9200
username string
Default: ""
password string
Default: ""
api_key_id string | null
api_key string | null
aws4auth
index string
Default: "document"
label_index string
Default: "label"
search_fields string | array
Default: "content"
content_field string
Default: "content"
name_field string
Default: "name"
embedding_field string
Default: "embedding"
embedding_dim integer
Default: 768
custom_mapping object | null
excluded_meta_data array | null
analyzer string
Default: "standard"
scheme string
Default: "http"
ca_certs string | null
verify_certs boolean
Default: true
recreate_index boolean
Default: false
create_index boolean
Default: true
refresh_type string
Default: "wait_for"
similarity string
Default: "dot_product"
timeout integer
Default: 30
return_embedding boolean
Default: false
duplicate_documents string
Default: "overwrite"
index_type string
Default: "flat"
scroll string
Default: "1d"
skip_missing_embeddings boolean
Default: true
synonyms array | null
synonym_type string
Default: "synonym"
use_system_proxy boolean
Default: false
FAISSDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "FAISSDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

18 nested properties
sql_url string
Default: "sqlite:///faiss_document_store.db"
vector_dim integer
embedding_dim integer
Default: 768
faiss_index_factory_str string
Default: "Flat"
faiss_index string | null
Default: null
return_embedding boolean
Default: false
index string
Default: "document"
similarity string
Default: "dot_product"
embedding_field string
Default: "embedding"
progress_bar boolean
Default: true
duplicate_documents string
Default: "overwrite"
faiss_index_path string | string
faiss_config_path string | string
isolation_level string
n_links integer
Default: 64
ef_search integer
Default: 20
ef_construction integer
Default: 80
validate_index_sync boolean
Default: true
GraphDBKnowledgeGraphComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "GraphDBKnowledgeGraph"
params object

Each parameter can reference other components defined in the same YAML file.

6 nested properties
host string
Default: "localhost"
port integer
Default: 7200
username string
Default: ""
password string
Default: ""
index string | null
prefixes string
Default: ""
InMemoryDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "InMemoryDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

11 nested properties
index string
Default: "document"
label_index string
Default: "label"
embedding_field string | null
Default: "embedding"
embedding_dim integer
Default: 768
return_embedding boolean
Default: false
similarity string
Default: "dot_product"
progress_bar boolean
Default: true
duplicate_documents string
Default: "overwrite"
use_gpu boolean
Default: true
scoring_batch_size integer
Default: 500000
devices string | string[] | null
InMemoryKnowledgeGraphComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "InMemoryKnowledgeGraph"
params object

Each parameter can reference other components defined in the same YAML file.

1 nested properties
index string
Default: "document"
Milvus2DocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "Milvus2DocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

21 nested properties
sql_url string
Default: "sqlite:///"
host string
Default: "localhost"
port string
Default: "19530"
connection_pool string
Default: "SingletonThread"
index string
Default: "document"
vector_dim integer
embedding_dim integer
Default: 768
index_file_size integer
Default: 1024
similarity string
Default: "dot_product"
index_type string
Default: "IVF_FLAT"
index_param object | null
search_param object | null
return_embedding boolean
Default: false
embedding_field string
Default: "embedding"
id_field string
Default: "id"
custom_fields array | null
progress_bar boolean
Default: true
duplicate_documents string
Default: "overwrite"
isolation_level string
consistency_level integer
Default: 0
recreate_index boolean
Default: false
OpenDistroElasticsearchDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "OpenDistroElasticsearchDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

33 nested properties
scheme string
Default: "https"
username string
Default: "admin"
password string
Default: "admin"
host string | string[]
Default: "localhost"
port integer | integer[]
Default: 9200
api_key_id string | null
api_key string | null
aws4auth
index string
Default: "document"
label_index string
Default: "label"
search_fields string | array
Default: "content"
content_field string
Default: "content"
name_field string
Default: "name"
embedding_field string
Default: "embedding"
embedding_dim integer
Default: 768
custom_mapping object | null
excluded_meta_data array | null
analyzer string
Default: "standard"
ca_certs string | null
verify_certs boolean
Default: false
recreate_index boolean
Default: false
create_index boolean
Default: true
refresh_type string
Default: "wait_for"
similarity string
Default: "cosine"
timeout integer
Default: 30
return_embedding boolean
Default: false
duplicate_documents string
Default: "overwrite"
index_type string
Default: "flat"
scroll string
Default: "1d"
skip_missing_embeddings boolean
Default: true
synonyms array | null
synonym_type string
Default: "synonym"
use_system_proxy boolean
Default: false
OpenSearchDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "OpenSearchDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

34 nested properties
scheme string
Default: "https"
username string
Default: "admin"
password string
Default: "admin"
host string | string[]
Default: "localhost"
port integer | integer[]
Default: 9200
api_key_id string | null
api_key string | null
aws4auth
index string
Default: "document"
label_index string
Default: "label"
search_fields string | array
Default: "content"
content_field string
Default: "content"
name_field string
Default: "name"
embedding_field string
Default: "embedding"
embedding_dim integer
Default: 768
custom_mapping object | null
excluded_meta_data array | null
analyzer string
Default: "standard"
ca_certs string | null
verify_certs boolean
Default: false
recreate_index boolean
Default: false
create_index boolean
Default: true
refresh_type string
Default: "wait_for"
similarity string
Default: "dot_product"
timeout integer
Default: 30
return_embedding boolean
Default: false
duplicate_documents string
Default: "overwrite"
index_type string
Default: "flat"
scroll string
Default: "1d"
skip_missing_embeddings boolean
Default: true
synonyms array | null
synonym_type string
Default: "synonym"
use_system_proxy boolean
Default: false
knn_engine string
Default: "nmslib"
PineconeDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "PineconeDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

15 nested properties
api_key string required
environment string
Default: "us-west1-gcp"
pinecone_index string | null
Default: null
embedding_dim integer
Default: 768
return_embedding boolean
Default: false
index string
Default: "document"
similarity string
Default: "cosine"
replicas integer
Default: 1
shards integer
Default: 1
embedding_field string
Default: "embedding"
progress_bar boolean
Default: true
duplicate_documents string
Default: "overwrite"
recreate_index boolean
Default: false
metadata_config object
Default:
{
  "indexed": []
}
validate_index_sync boolean
Default: true
SQLDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "SQLDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

6 nested properties
url string
Default: "sqlite://"
index string
Default: "document"
label_index string
Default: "label"
duplicate_documents string
Default: "overwrite"
check_same_thread boolean
Default: false
isolation_level string
WeaviateDocumentStoreComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "WeaviateDocumentStore"
params object

Each parameter can reference other components defined in the same YAML file.

17 nested properties
host string | string[]
Default: "http://localhost"
port integer | integer[]
Default: 8080
timeout_config array
Default:
[
  5,
  15
]
username string
password string
index string
Default: "Document"
embedding_dim integer
Default: 768
content_field string
Default: "content"
name_field string
Default: "name"
similarity string
Default: "cosine"
index_type string
Default: "hnsw"
custom_schema object | null
return_embedding boolean
Default: false
embedding_field string
Default: "embedding"
progress_bar boolean
Default: true
duplicate_documents string
Default: "overwrite"
recreate_index boolean
Default: false
AnswerToSpeechComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "AnswerToSpeech"
params object

Each parameter can reference other components defined in the same YAML file.

6 nested properties
model_name_or_path string | string
Default: "espnet/kan-bayashi_ljspeech_vits"
generated_audio_dir string
Default: "generated_audio_answers"
format=path
audio_params object | null
transformers_params object | null
progress_bar boolean
Default: true
devices string | string[] | null
AzureConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "AzureConverter"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
endpoint string required
credential_key string required
model_id string
Default: "prebuilt-document"
valid_languages string[] | null
save_json boolean
Default: false
preceding_context_len integer
Default: 3
following_context_len integer
Default: 3
merge_multiple_column_headers boolean
Default: true
id_hash_keys string[] | null
add_page_number boolean
Default: true
BM25RetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "BM25Retriever"
params object

Each parameter can reference other components defined in the same YAML file.

5 nested properties
document_store string required
top_k integer
Default: 10
all_terms_must_match boolean
Default: false
custom_query string | null
scale_score boolean
Default: true
CrawlerComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "Crawler"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
output_dir string required
urls string[] | null
crawler_depth integer
Default: 1
filter_urls array | null
overwrite_existing_files
Default: true
id_hash_keys string[] | null
extract_hidden_text
Default: true
loading_wait_time integer | null
crawler_naming_function string | null
Default: null
webdriver_options string[] | null
DensePassageRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "DensePassageRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

17 nested properties
document_store string required
query_embedding_model string | string
Default: "facebook/dpr-question_encoder-single-nq-base"
passage_embedding_model string | string
Default: "facebook/dpr-ctx_encoder-single-nq-base"
model_version string | null
max_seq_len_query integer
Default: 64
max_seq_len_passage integer
Default: 256
top_k integer
Default: 10
use_gpu boolean
Default: true
batch_size integer
Default: 16
embed_title boolean
Default: true
use_fast_tokenizers boolean
Default: true
similarity_function string
Default: "dot_product"
global_loss_buffer_size integer
Default: 150000
progress_bar boolean
Default: true
devices string | string[] | null
use_auth_token boolean | string | null
scale_score boolean
Default: true
Docs2AnswersComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "Docs2Answers"
params object

Each parameter can reference other components defined in the same YAML file.

1 nested properties
progress_bar boolean
Default: true
DocumentToSpeechComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "DocumentToSpeech"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
model_name_or_path string | string
Default: "espnet/kan-bayashi_ljspeech_vits"
generated_audio_dir string
Default: "generated_audio_documents"
format=path
audio_params object | null
transformers_params object | null
DocxToTextConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "DocxToTextConverter"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
progress_bar boolean
Default: true
ElasticsearchFilterOnlyRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "ElasticsearchFilterOnlyRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
document_store string required
top_k integer
Default: 10
all_terms_must_match boolean
Default: false
custom_query string | null
ElasticsearchRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "ElasticsearchRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
document_store string required
top_k integer
Default: 10
all_terms_must_match boolean
Default: false
custom_query string | null
EmbeddingRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "EmbeddingRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

15 nested properties
document_store string required
embedding_model string required
model_version string | null
use_gpu boolean
Default: true
batch_size integer
Default: 32
max_seq_len integer
Default: 512
model_format string | null
pooling_strategy string
Default: "reduce_mean"
emb_extraction_layer integer
Default: -1
top_k integer
Default: 10
progress_bar boolean
Default: true
devices string | string[] | null
use_auth_token boolean | string | null
scale_score boolean
Default: true
embed_meta_fields string[]
Default:
[]
EntityExtractorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "EntityExtractor"
params object

Each parameter can reference other components defined in the same YAML file.

7 nested properties
model_name_or_path string
Default: "dslim/bert-base-NER"
model_version string | null
use_gpu boolean
Default: true
batch_size integer
Default: 16
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
EvalAnswersComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "EvalAnswers"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
skip_incorrect_retrieval boolean
Default: true
open_domain boolean
Default: true
sas_model string
debug boolean
Default: false
EvalDocumentsComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "EvalDocuments"
params object

Each parameter can reference other components defined in the same YAML file.

3 nested properties
debug boolean
Default: false
open_domain boolean
Default: true
top_k integer
Default: 10
FARMReaderComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "FARMReader"
params object

Each parameter can reference other components defined in the same YAML file.

22 nested properties
model_name_or_path string required
model_version string | null
context_window_size integer
Default: 150
batch_size integer
Default: 50
use_gpu boolean
Default: true
devices string | string[] | null
no_ans_boost number
Default: 0.0
return_no_answer boolean
Default: false
top_k integer
Default: 10
top_k_per_candidate integer
Default: 3
top_k_per_sample integer
Default: 1
num_processes integer | null
max_seq_len integer
Default: 256
doc_stride integer
Default: 128
progress_bar boolean
Default: true
duplicate_filtering integer
Default: 0
use_confidence_scores boolean
Default: true
confidence_threshold number | null
proxies Record<string, string>
local_files_only
Default: false
force_download
Default: false
use_auth_token boolean | string | null
FileTypeClassifierComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "FileTypeClassifier"
params object

Each parameter can reference other components defined in the same YAML file.

1 nested properties
supported_types string[]
Default:
[
  "txt",
  "pdf",
  "md",
  "docx",
  "html"
]
FilterRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "FilterRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

5 nested properties
document_store string required
top_k integer
Default: 10
all_terms_must_match boolean
Default: false
custom_query string | null
scale_score boolean
Default: true
ImageToTextConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "ImageToTextConverter"
params object

Each parameter can reference other components defined in the same YAML file.

3 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
Default:
[
  "eng"
]
id_hash_keys string[] | null
JoinAnswersComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "JoinAnswers"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
join_mode string
Default: "concatenate"
weights number[] | null
top_k_join integer | null
sort_by_score boolean
Default: true
JoinDocumentsComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "JoinDocuments"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
join_mode string
Default: "concatenate"
weights number[] | null
top_k_join integer | null
sort_by_score boolean
Default: true
MarkdownConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "MarkdownConverter"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
progress_bar boolean
Default: true
MultihopEmbeddingRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "MultihopEmbeddingRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

16 nested properties
document_store string required
embedding_model string required
model_version string | null
num_iterations integer
Default: 2
use_gpu boolean
Default: true
batch_size integer
Default: 32
max_seq_len integer
Default: 512
model_format string
Default: "farm"
pooling_strategy string
Default: "reduce_mean"
emb_extraction_layer integer
Default: -1
top_k integer
Default: 10
progress_bar boolean
Default: true
devices string | string[] | null
use_auth_token boolean | string | null
scale_score boolean
Default: true
embed_meta_fields string[]
Default:
[]
OpenAIAnswerGeneratorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "OpenAIAnswerGenerator"
params object

Each parameter can reference other components defined in the same YAML file.

11 nested properties
api_key string required
model string
Default: "text-curie-001"
max_tokens integer
Default: 13
top_k integer
Default: 5
temperature number
Default: 0.2
presence_penalty number
Default: -2.0
frequency_penalty number
Default: -2.0
examples_context string | null
examples array | null
stop_words array | null
progress_bar boolean
Default: true
PDFToTextConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "PDFToTextConverter"
params object

Each parameter can reference other components defined in the same YAML file.

5 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
encoding string | null
Default: "UTF-8"
keep_physical_layout boolean
Default: false
PDFToTextOCRConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "PDFToTextOCRConverter"
params object

Each parameter can reference other components defined in the same YAML file.

3 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
Default:
[
  "eng"
]
id_hash_keys string[] | null
ParsrConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "ParsrConverter"
params object

Each parameter can reference other components defined in the same YAML file.

11 nested properties
parsr_url string
Default: "http://localhost:3001"
extractor string
Default: "pdfminer"
Values: "pdfminer" "pdfjs"
table_detection_mode string
Default: "lattice"
Values: "lattice" "stream"
preceding_context_len integer
Default: 3
following_context_len integer
Default: 3
remove_page_headers boolean
Default: false
remove_page_footers boolean
Default: false
remove_table_of_contents boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
add_page_number boolean
Default: true
PreProcessorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "PreProcessor"
params object

Each parameter can reference other components defined in the same YAML file.

13 nested properties
clean_whitespace boolean
Default: true
clean_header_footer boolean
Default: false
clean_empty_lines boolean
Default: true
remove_substrings string[]
Default:
[]
split_by string
Default: "word"
split_length integer
Default: 200
split_overlap integer
Default: 0
split_respect_sentence_boundary boolean
Default: true
tokenizer_model_folder string | string | null
language string
Default: "en"
id_hash_keys string[] | null
progress_bar boolean
Default: true
add_page_number boolean
Default: false
PseudoLabelGeneratorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "PseudoLabelGenerator"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
question_producer string | object[] required
retriever string required
cross_encoder_model_name_or_path string
Default: "cross-encoder/ms-marco-MiniLM-L-6-v2"
max_questions_per_document integer
Default: 3
top_k integer
Default: 50
batch_size integer
Default: 16
progress_bar boolean
Default: true
use_auth_token boolean | string | null
use_gpu boolean
Default: true
devices string | string[] | null
QuestionGeneratorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "QuestionGenerator"
params object

Each parameter can reference other components defined in the same YAML file.

17 nested properties
model_name_or_path string
Default: "valhalla/t5-base-e2e-qg"
model_version string | null
num_beams integer
Default: 4
max_length integer
Default: 256
no_repeat_ngram_size integer
Default: 3
length_penalty number
Default: 1.5
early_stopping boolean
Default: true
split_length integer
Default: 50
split_overlap integer
Default: 10
use_gpu boolean
Default: true
prompt string
Default: "generate questions:"
num_queries_per_doc integer
Default: 1
sep_token string
Default: "<sep>"
batch_size integer
Default: 16
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
RAGeneratorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "RAGenerator"
params object

Each parameter can reference other components defined in the same YAML file.

14 nested properties
model_name_or_path string
Default: "facebook/rag-token-nq"
model_version string | null
retriever string | null
Default: null
generator_type string
Default: "token"
top_k integer
Default: 2
max_length integer
Default: 200
min_length integer
Default: 2
num_beams integer
Default: 2
embed_title boolean
Default: true
prefix string | null
use_gpu boolean
Default: true
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
RCIReaderComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "RCIReader"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
row_model_name_or_path string
Default: "michaelrglass/albert-base-rci-wikisql-row"
column_model_name_or_path string
Default: "michaelrglass/albert-base-rci-wikisql-col"
row_model_version string | null
column_model_version string | null
row_tokenizer string | null
column_tokenizer string | null
use_gpu boolean
Default: true
top_k integer
Default: 10
max_seq_len integer
Default: 256
use_auth_token boolean | string | null
RouteDocumentsComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "RouteDocuments"
params object

Each parameter can reference other components defined in the same YAML file.

2 nested properties
split_by string
Default: "content_type"
metadata_values string[] | null
SentenceTransformersRankerComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "SentenceTransformersRanker"
params object

Each parameter can reference other components defined in the same YAML file.

9 nested properties
model_name_or_path string | string required
model_version string | null
top_k integer
Default: 10
use_gpu boolean
Default: true
devices string | string[] | null
batch_size integer
Default: 16
scale_score boolean
Default: true
progress_bar boolean
Default: true
use_auth_token boolean | string | null
Seq2SeqGeneratorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "Seq2SeqGenerator"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
model_name_or_path string required
input_converter string | null
Default: null
top_k integer
Default: 1
max_length integer
Default: 200
min_length integer
Default: 2
num_beams integer
Default: 8
use_gpu boolean
Default: true
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
SklearnQueryClassifierComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "SklearnQueryClassifier"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
model_name_or_path string
Default: "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/model.pickle"
vectorizer_name_or_path string
Default: "https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/vectorizer.pickle"
batch_size integer | null
progress_bar boolean
Default: true
TableReaderComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TableReader"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
model_name_or_path string
Default: "google/tapas-base-finetuned-wtq"
model_version string | null
tokenizer string | null
use_gpu boolean
Default: true
top_k integer
Default: 10
top_k_per_candidate integer
Default: 3
return_no_answer boolean
Default: false
max_seq_len integer
Default: 256
use_auth_token boolean | string | null
devices string | string[] | null
TableTextRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TableTextRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

20 nested properties
document_store string required
query_embedding_model string | string
Default: "deepset/bert-small-mm_retrieval-question_encoder"
passage_embedding_model string | string
Default: "deepset/bert-small-mm_retrieval-passage_encoder"
table_embedding_model string | string
Default: "deepset/bert-small-mm_retrieval-table_encoder"
model_version string | null
max_seq_len_query integer
Default: 64
max_seq_len_passage integer
Default: 256
max_seq_len_table integer
Default: 256
top_k integer
Default: 10
use_gpu boolean
Default: true
batch_size integer
Default: 16
embed_meta_fields string[]
Default:
[
  "name",
  "section_title",
  "caption"
]
use_fast_tokenizers boolean
Default: true
similarity_function string
Default: "dot_product"
global_loss_buffer_size integer
Default: 150000
progress_bar boolean
Default: true
devices string | string[] | null
use_auth_token boolean | string | null
scale_score boolean
Default: true
use_fast boolean
Default: true
Text2SparqlRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "Text2SparqlRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

5 nested properties
knowledge_graph string required
model_name_or_path string
model_version string | null
top_k integer
Default: 1
use_auth_token boolean | string | null
TextConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TextConverter"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
progress_bar boolean
Default: true
TfidfRetrieverComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TfidfRetriever"
params object

Each parameter can reference other components defined in the same YAML file.

3 nested properties
document_store string required
top_k integer
Default: 10
auto_fit
Default: true
TikaConverterComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TikaConverter"
params object

Each parameter can reference other components defined in the same YAML file.

4 nested properties
tika_url string
Default: "http://localhost:9998/tika"
remove_numeric_tables boolean
Default: false
valid_languages string[] | null
id_hash_keys string[] | null
TransformersDocumentClassifierComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TransformersDocumentClassifier"
params object

Each parameter can reference other components defined in the same YAML file.

12 nested properties
model_name_or_path string
Default: "bhadresh-savani/distilbert-base-uncased-emotion"
model_version string | null
tokenizer string | null
use_gpu boolean
Default: true
return_all_scores boolean
Default: false
task string
Default: "text-classification"
labels string[] | null
batch_size integer
Default: 16
classification_field string
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
TransformersQueryClassifierComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TransformersQueryClassifier"
params object

Each parameter can reference other components defined in the same YAML file.

10 nested properties
model_name_or_path string | string
Default: "shahrukhx01/bert-mini-finetune-question-detection"
model_version string | null
tokenizer string | null
use_gpu boolean
Default: true
task string
Default: "text-classification"
labels string[]
Default:
[
  "LABEL_1",
  "LABEL_0"
]
batch_size integer
Default: 16
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
TransformersReaderComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TransformersReader"
params object

Each parameter can reference other components defined in the same YAML file.

13 nested properties
model_name_or_path string
Default: "distilbert-base-uncased-distilled-squad"
model_version string | null
tokenizer string | null
context_window_size integer
Default: 70
use_gpu boolean
Default: true
top_k integer
Default: 10
top_k_per_candidate integer
Default: 3
return_no_answers boolean
Default: false
max_seq_len integer
Default: 256
doc_stride integer
Default: 128
batch_size integer
Default: 16
use_auth_token boolean | string | null
devices string | string[] | null
TransformersSummarizerComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TransformersSummarizer"
params object

Each parameter can reference other components defined in the same YAML file.

13 nested properties
model_name_or_path string
Default: "google/pegasus-xsum"
model_version string | null
tokenizer string | null
max_length integer
Default: 200
min_length integer
Default: 5
use_gpu boolean
Default: true
clean_up_tokenization_spaces boolean
Default: true
separator_for_single_summary string
Default: " "
generate_single_summary boolean
Default: false
batch_size integer
Default: 16
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null
TransformersTranslatorComponent object
name string required

Custom name for the component. Helpful for visualization and debugging.

type string required

Haystack Class name for the component.

Constant: "TransformersTranslator"
params object

Each parameter can reference other components defined in the same YAML file.

8 nested properties
model_name_or_path string required
tokenizer_name string | null
max_seq_len integer | null
clean_up_tokenization_spaces boolean | null
Default: true
use_gpu boolean
Default: true
progress_bar boolean
Default: true
use_auth_token boolean | string | null
devices string | string[] | null