pgap_yaml_input_reader
NCBI Prokaryotic Genome Annotation Pipeline (PGAP) input metadata (submol) JSON/YAML configuration file
| Type | object |
|---|---|
| File match |
submol*.json
submol*.yml
submol*.yaml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/pgap-yaml-input-reader/latest.json |
| Source | https://www.schemastore.org/pgap_yaml_input_reader.json |
Validate with Lintel
npx @lintel/lintel check
NCBI Prokaryotic Genome Annotation Pipeline (PGAP) input metadata (submol) JSON/YAML configuration file
Properties
Optional, but include if intending to submit to GenBank. Authors can be different from the contact.
Optional, but include if intending to submit to GenBank. The main contact for this genome assembly.
13 nested properties
The value of this keyword MUST be a URI (containing a scheme) and this URI MUST be normalized.
Name of the project that generated the genome assembly
Appears in the COMMENT section of each GenBank sequence record.
NCBI internal flag used for testing.
Sequence reads used to build the assembly
2 nested properties
One to 9-letter prefix to use for naming genes on this genome assembly. If an official locus tag prefix was already reserved from an INSDC organization (GenBank, ENA or DDBJ) for the given BioSample and BioProject pair, provide here. Otherwise, provide a string of your choice. If no value is provided, the prefix 'pgaptmp' will be used. See more details in this Note about locus tags at: https://github.com/ncbi/pgap/wiki/Input-Files#Note-about-locus-tags
2 nested properties
Strain of the sequenced organism
Binomial name or, if the species is unknown, genus for the sequenced organism. This identifier must be valid in NCBI Taxonomy. See Taxonomy information for how to find out if the name is valid: https://github.com/ncbi/pgap/wiki/Input-Files#Taxonomy-information
Possible values are linear or circular. Circular means that the first base in the sequence is adjacent to the last base. Please provide the topology in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are circular and others linear, include the topology in the definition line of each sequence in the fasta file with the tag value pair [topology=circular] or [topology=linear], after the SeqID and a space (e.g. >seq1 [topology=circular]). If the topology is provided in neither the metadata YAML nor the fasta file, the sequences will be presumed to be linear.
Possible values are chromosome or plasmid. Please provide the location in the metadata YAML file only if it is applicable to ALL sequences in the fasta file. If some sequences in the assembled genome are chromosomes and others plasmids, include the location in the definition line of each sequence in the fasta file with the tag value pair [location=chromosome] or [location=plasmid], after the SeqID and a space (e.g. >seq1 [location=plasmid]). In plasmid case add [plasmid-name=