pgap_yaml_input_reader (SchemaStore) JSON Schema

Type	object
File match	`submol.json` `submol.yml` `submol*.yaml`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/pgap-yaml-input-reader/latest.json
Source	https://www.schemastore.org/pgap_yaml_input_reader.json

Validate with Lintel

npx @lintel/lintel check

Type: object

NCBI Prokaryotic Genome Annotation Pipeline (PGAP) input metadata (submol) JSON/YAML configuration file

Properties

authors object[] required

Optional, but include if intending to submit to GenBank. Authors can be different from the contact.

contact_info object required

Optional, but include if intending to submit to GenBank. The main contact for this genome assembly.

13 nested properties

city string required

Default: ""

Examples: "Docker"

country string required

Default: ""

Examples: "Lappland"

department string required

Default: ""

Examples: "Department of Using NCBI"

email string required

Default: ""

Examples: "[email protected]"

first_name string required

Default: ""

Examples: "Jane"

last_name string required

Default: ""

Examples: "Doe"

organization string required

Default: ""

Examples: "Institute of Klebsiella foobarensis research"

postal_code string required

Default: ""

Examples: "12345"

street string required

Default: ""

Examples: "1234 Main St"

state string

Default: ""

Examples: "MD", "Florida"

fax string

Default: ""

Examples: "301-555-1234", "+7 095 555 1234"

middle_initial string

Default: ""

Examples: "N"

phone string

Default: ""

Examples: "301-555-0245"

$schema string

The value of this keyword MUST be a URI (containing a scheme) and this URI MUST be normalized.

Default: ""

Examples: "https://www.schemastore.org/pgap_yaml_input_reader"

consortium string

Name of the project that generated the genome assembly

Default: ""

Examples: "SkyNet"

comment string

Appears in the COMMENT section of each GenBank sequence record.

Default: ""

Examples:

"This draft WGS assembly was generated by running SKESA to generate a de-novo assembly. The de-novo assembly was then concatenated with  configs generated using a guided assembler using antimicrobial resistance genes as baits to comprehensively catalog the set of resistance genes in the isolate. Note, some parts of the configs derived from the guided assembler may overlap de-novo configs, and other guided assembler configs. De-novo configs can be differentiated from guided assembler configs by their names, which include either 'denovo' or 'guided'."

tp_assembly boolean

NCBI internal flag used for testing.

Default: false

Examples: false

sra object[]

Sequence reads used to build the assembly

bioproject string

Default: ""

Examples: "PRJ9999999"

biosample string

Default: ""

Examples: "SAMN99999999"

fasta object

2 nested properties

class string

Default: ""

Examples: "File"

location string

Default: ""

Examples: "sample_fasta_input.fasta"

locus_tag_prefix string

One to 9-letter prefix to use for naming genes on this genome assembly. If an official locus tag prefix was already reserved from an INSDC organization (GenBank, ENA or DDBJ) for the given BioSample and BioProject pair, provide here. Otherwise, provide a string of your choice. If no value is provided, the prefix 'pgaptmp' will be used. See more details in this Note about locus tags at: https://github.com/ncbi/pgap/wiki/Input-Files#Note-about-locus-tags

Default: ""

Examples: "tmp"

organism object

2 nested properties

strain string

Strain of the sequenced organism

Default: ""

Examples: "my_strain"

genus_species string

Binomial name or, if the species is unknown, genus for the sequenced organism. This identifier must be valid in NCBI Taxonomy. See Taxonomy information for how to find out if the name is valid: https://github.com/ncbi/pgap/wiki/Input-Files#Taxonomy-information

Default: ""

Examples: "Escherichia coli"

publications object[]

topology string

Possible values are linear or circular. Circular means that the first base in the sequence is adjacent to the last base. Please provide the topology in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are circular and others linear, include the topology in the definition line of each sequence in the fasta file with the tag value pair [topology=circular] or [topology=linear], after the SeqID and a space (e.g. >seq1 [topology=circular]). If the topology is provided in neither the metadata YAML nor the fasta file, the sequences will be presumed to be linear.

Default: ""

Examples: "circular", "linear"

location string

Possible values are chromosome or plasmid. Please provide the location in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are chromosomes and others plasmids, include the location in the definition line of each sequence in the fasta file with the tag value pair [location=chromosome] or [location=plasmid], after the SeqID and a space (e.g. >seq1 [location=plasmid]). In plasmid case add [plasmid-name=]. If the location is provided in neither the metadata YAML nor the fasta file, the sequences will be presumed to be chromosome. Note: since 2021 releases of PGAPx this will affect noticeably the annotation on the molecule

Default: ""

Examples: "chromosome", "plasmid"