Type object
File match submol*.json submol*.yml submol*.yaml
Schema URL https://catalog.lintel.tools/schemas/schemastore/pgap-yaml-input-reader/latest.json
Source https://www.schemastore.org/pgap_yaml_input_reader.json

Validate with Lintel

npx @lintel/lintel check
Type: object

NCBI Prokaryotic Genome Annotation Pipeline (PGAP) input metadata (submol) JSON/YAML configuration file

Properties

authors object[] required

Optional, but include if intending to submit to GenBank. Authors can be different from the contact.

contact_info object required

Optional, but include if intending to submit to GenBank. The main contact for this genome assembly.

13 nested properties
city string required
Default: ""
Examples: "Docker"
country string required
Default: ""
Examples: "Lappland"
department string required
Default: ""
Examples: "Department of Using NCBI"
email string required
Default: ""
Examples: "[email protected]"
first_name string required
Default: ""
Examples: "Jane"
last_name string required
Default: ""
Examples: "Doe"
organization string required
Default: ""
Examples: "Institute of Klebsiella foobarensis research"
postal_code string required
Default: ""
Examples: "12345"
street string required
Default: ""
Examples: "1234 Main St"
state string
Default: ""
Examples: "MD", "Florida"
fax string
Default: ""
Examples: "301-555-1234", "+7 095 555 1234"
middle_initial string
Default: ""
Examples: "N"
phone string
Default: ""
Examples: "301-555-0245"
$schema string

The value of this keyword MUST be a URI (containing a scheme) and this URI MUST be normalized.

Default: ""
Examples: "https://www.schemastore.org/pgap_yaml_input_reader"
consortium string

Name of the project that generated the genome assembly

Default: ""
Examples: "SkyNet"
comment string

Appears in the COMMENT section of each GenBank sequence record.

Default: ""
Examples: "This draft WGS assembly was generated by running SKESA to generate a de-novo assembly. The de-novo assembly was then concatenated with configs generated using a guided assembler using antimicrobial resistance genes as baits to comprehensively catalog the set of resistance genes in the isolate. Note, some parts of the configs derived from the guided assembler may overlap de-novo configs, and other guided assembler configs. De-novo configs can be differentiated from guided assembler configs by their names, which include either 'denovo' or 'guided'."
tp_assembly boolean

NCBI internal flag used for testing.

Default: false
Examples: false
sra object[]

Sequence reads used to build the assembly

bioproject string
Default: ""
Examples: "PRJ9999999"
biosample string
Default: ""
Examples: "SAMN99999999"
fasta object
2 nested properties
class string
Default: ""
Examples: "File"
location string
Default: ""
Examples: "sample_fasta_input.fasta"
locus_tag_prefix string

One to 9-letter prefix to use for naming genes on this genome assembly. If an official locus tag prefix was already reserved from an INSDC organization (GenBank, ENA or DDBJ) for the given BioSample and BioProject pair, provide here. Otherwise, provide a string of your choice. If no value is provided, the prefix 'pgaptmp' will be used. See more details in this Note about locus tags at: https://github.com/ncbi/pgap/wiki/Input-Files#Note-about-locus-tags

Default: ""
Examples: "tmp"
organism object
2 nested properties
strain string

Strain of the sequenced organism

Default: ""
Examples: "my_strain"
genus_species string

Binomial name or, if the species is unknown, genus for the sequenced organism. This identifier must be valid in NCBI Taxonomy. See Taxonomy information for how to find out if the name is valid: https://github.com/ncbi/pgap/wiki/Input-Files#Taxonomy-information

Default: ""
Examples: "Escherichia coli"
publications object[]
topology string

Possible values are linear or circular. Circular means that the first base in the sequence is adjacent to the last base. Please provide the topology in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are circular and others linear, include the topology in the definition line of each sequence in the fasta file with the tag value pair [topology=circular] or [topology=linear], after the SeqID and a space (e.g. >seq1 [topology=circular]). If the topology is provided in neither the metadata YAML nor the fasta file, the sequences will be presumed to be linear.

Default: ""
Examples: "circular", "linear"
location string

Possible values are chromosome or plasmid. Please provide the location in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are chromosomes and others plasmids, include the location in the definition line of each sequence in the fasta file with the tag value pair [location=chromosome] or [location=plasmid], after the SeqID and a space (e.g. >seq1 [location=plasmid]). In plasmid case add [plasmid-name=]. If the location is provided in neither the metadata YAML nor the fasta file, the sequences will be presumed to be chromosome. Note: since 2021 releases of PGAPx this will affect noticeably the annotation on the molecule

Default: ""
Examples: "chromosome", "plasmid"