Proposal Transgenic Line Nomenclature

From PhyscomeProjectWiki

Jump to: navigation, search

Contents

UNDER CONSTRUCTION

--Lang 17:29, 10 January 2012 (UTC)

Proposal for a unified nomenclature for transgenic moss lines

The molecular toolbox for genetic modification of (moss) plants is diverse. This methodological diversity inevitably leads to a multitude of possible textual descriptions of generated transgenic lines (TL) leading to ambiguities and loss of reproducibility. TL nomenclature, i.e. naming of mutant lines in journal articles or conference proceedings, is especially diverse.

Crucial for scientific advancement is the reproducibility of published work. The most important aspect for reproduction of scientific findings clearly lays in unambiguous and precise documentation of the experimental procedures and used materials. Furthermore, the ever-growing body of scientific literature and data more and more involves the need to apply large-scale analysis of scientific knowledge in order to draw well-informed conclusions. Computational large-scale analysis requires well-formed standards to allow optimal results in automatic knowledge retrieval.

The disillusioning reality of scientific publications reveals a surprisingly high fraction of cryptic or fragmentary documentation. Textual descriptions and references for experimental procedures or used materials/organisms/genes are often cryptic or incomplete.

Particularly the unambiguous determination of the underlying sequence used as a frame of reference for the construction of TL is often problematic. Many publications do not provide or reference the underlying sequence and database entries in a way which is accessible by automated procedures. I.e. not in all cases, provision of primer sequences is sufficient to define the original construct (especially if authors provide them in images or incorrect). Due to this problem, present textual descriptions of mutant moss lines (January 2012) are largely inaccessible to large-scale analysis (even if text mining is applied). This situation seriously hinders the advance of moss (systems) biology and further development of the model organism Physcomitrella patens.

Examples for possible ambiguities in TL nomenclature

knockout lines
  1. deletion of the entire genic region (upstream 200bp + 5'UTR, CDS and 3'UTR encoding regions + downstream 200bp)
  2. partial deletion/disruption of part of the coding sequence
  3. frameshift mutation leading to premature stop codons inducing NMD

GMI - Genetic Modification Syntax

Idea
prefix line names with well-defined, simple, unambiguous syntax to distinguish between different types of TL. A rich syntax can be used to provide all relevant information about a published transgenic line. These full names can get rather long are intended to be used only once (in the Methods section) in the manuscripts. For referencing throughout a paper, a short, abbreviated syntax can be used. See below.

General syntax

Fields

The GMI provides a simple syntax combining three fields for the definition of transgenic lines:

PREFIX|LOCUS#LINE_INDEX
PREFIX
Description of the TL type according to GMI standard defined in the following sections in more detail.
LOCUS
Name of the transgenic locus i.e. the targeted/modified gene. See guidelines for gene naming.
LINE_INDEX
Index/Numbering used by the authors to refer to the TL in the lab/publication.

Further optional fields for the extended syntax required for some TL types:

PRODUCT
Introduced construct which is usually expressed in the TL. E.g. insertion lines expression reported constructs.
WHERE
Positional term. Location that is affected or altered. Can be specified using FROM:TO statements
FROM
Start coordinate in a positional term. By convention positional information in nucleotide coordinates and is either in reference to the start of the sequence provided with the description of the TLMN or with respect to the start codon position of the CDS of the referenced database entry. If no precise coordinates are available please use one of positional terms described below to indicate the place of e.g. insertion in a gene.
TO
Part of a positional term that indicates the end of the region. See FROM.


Cross-species expression / complementation assays

If you need to refer to genes or constructs derived from other species (an exogenous source) include organismal information into the PRODUCT definition:

2LETTER-CODE_PRODUCTNAME

e.g.:

Os_act1_promoter
Positional terms or other sequence features

To provide positional information with knowledge of the precise sequence coordinates, please use terms from the list below or from the sequence ontology.

Terms can be combined with additional information important to understand nature and origin of a PRODUCT. What is altered/deleted/introduced? Which species is the promoter from? E.g. fragment-constructs are indicated by attaching the terms to the PRODUCT name using an underscore.

5UTR
5'Untranslated Region of a mRNA. (SO:five_prime_UTR)
3UTR
3'Untranslated Region of a mRNA. (SO:five_prime_UTR)
CDS
Coding sequence of the gene
engineered_plasmid
A plasmid that is engineered.(SO:0000637) --> used to describe WHERE in Proposal_Transgenic_Line_Nomenclature#Transient assays
exon

to indicate which particular exon (1-n) was altered/deleted/introduced:

exon_NUMBER
exon_1 
last_exon
Last exon of the gene
first_exon
First exon of the gene
endogenous_locus
Used in complementation assays to indicate that the introduced cross-species product was inserted at the locus of the gene to be complemented.
promoter
PRODUCT_promoter
CaMV_35S_promoter
ppr_43_promoter
Os_act1_promoter
five_prime
three_prime
complete_locus
(including promoter and possible downstream)
complete_gene
(only gene regions: 5UTR+CDS+3UTR)
start_codon
stop_codon
TSS
domain
fragment
undefined fragment
PRODUCT_fragment
ftsZ1_fragment
(postional) fragment
only a certain fragment
PRODUCT_fragment_FROM:TO
pp1_fragment_441:1179
fragment_wo
gene fragment by deletion of a domain
PRODUCT_fragment_wo_PARTSDELETED
ppr_43_fragment_wo_E/DYW
ectopic
integrated anywhere in the genome by non-homologous/illegitimate recombination. Can be added as a positional term if study checked for legitimate integration.

Field separators

The following characters are used as field separators to parse the expressions and thus cannot be used within the field specifications (see type definitions for examples):

| PREFIX separator
Defines the end of a prefix term and is followed by the locus term.
# LINE_INDEX separator
Defines the beginning of the line index term.
\ Gene separator
Is used to concatenate multiple locus constructs especially gene names in multiple knockout constructs. Can also be used to indicate different positional terms in the PREFIX. See examples below.
\\ Line separator
Similar to the gene separator. Used in cases where the genetic background is transgenic and requires an extended PREFIX for precise definition (e.g. requires positional information like a deletion construct). In this case the LOCUS is the TLMN description of the background line.
: Positional separator
Used to separate positional information i.e. sequence coordinates. By convention positional information in nucleotide coordinates and is either in reference to the start of the sequence provided with the description of the TLMN or with respect to the start codon position of the CDS of the referenced database entry.

Characters with frequent use in gene and construct names

:: double colon
Is used to express : in gene or construct names to indicate fusions etc. The single colon is reserved as a field separator.

Syntax to define different TL types

The following sections provide the grammar to use describe common TL types in TLMN syntax. Please contact me if you think something is missing.

Deletion constructs

General deletion construct
syntax
d|LOCUS#LINE_INDEX
examples
d|ftsZ1-1#1
Cre-Lox Deletion construct

If there is a subsequent removal of the selection marker using the Cre-Lox system.

syntax
d:Cre-Lox|LOCUS#INDEX
example
d:Cre-Lox|msh2#1
Excision or partial deletion constructs
e:FROM:TO|LOCUS#INDEX
e:1:462|mygene#1
e:1:462|e:1001:1005|mygene#1

Undirected insertion constructs

u|PRODUCT#INDEX
u|CaMV_35S_promoter::myoXIa#1
u|CaMV_35S_promoter::myoXIa_fragment_stop_codon#1

Insertion constructs

short form
i|LOCUS#INDEX
detailed form
i:WHERE:WHAT|LOCUS#INDEX
i:5UTR:GUS::NPT-II|mads1#1
#YFP fusion at position 2937 of arpc4 CDS
i:CDS_2937:2XeYFP|arpc4#1
i:WHAT:FROM:TO|LOCUS#INDEX
i:WHAT|LOCUS#INDEX

Replacement constructs

short form
r|LOCUS#INDEX
detailed form
r:FROM:TO:WITH|LOCUS#INDEX
r:WHERE:WITH|LOCUS#INDEX

Point mutation constructs

p:WHERE|LOCUS#INDEX
single SNP mutation
p:POSITIONFROMSTARTresultingchange|LOCUS#INDEX

Example from PMID:18298672:

#point mutation at position 6 starting from ATG resulting in a aspartate in an complementation line
p:6D|c:locus108:Zm_ubi1_promoter::adf_fragment_wo_3UTR|adf#1
multiple SNP mutations

Example from PMID:10449580:

p:226N\227N\228N\229N\230N|u:CaMV_35S_promoter::mcb1#1
single SNP mutation
#point mutation at position 1435 starting from TSS resulting in a thymine in a complementation line
p:1435t_TSS|c:locus108:Zm_ubi1_promoter::cry1a|cry1a#1

Silencing constructs

Artificial miRNA (amiRNA) constructs
Antisense or RNA interference (RNAi) constructs

Constructs that result in silencing of a locus by formation of a duplex and induction of RNAi pathways.

Short mutant names
s|LOCUS
Short line names
s|LOCUS#LINEX_INDEX
Detailed line specification
s:WHERE:PROMOTER:TARGET_ORIGIN|LOCUS#INDEX --> stable integration of silencing construct
s:PROMOTER:TARGET_ORIGIN|LOCUS#INDEX --> transient silencing construct
s:TARGET_ORIGIN|LOCUS#INDEX
Complex example
RNAi line targeting multiple targets (3 TARGET_ORIGINS) in a single construct without promoter specification
#GUS targeted at CDS while myosins in the 5'UTR
s:CDS\5UTR\5UTR|GUS\myoXIa\myoXIb#1
s:Zm_ubi1_promoter:CDS\5UTR\5UTR|GUS\myoXIa\myoXIb#1
#all three targeted against the CDS without PROMOTER
s:CDS|GUS\myoXIa\myoXIb#1
#antisense construct PMID:8401607:
s:pp1_fragment_wo_1:438|pp1#1
s:pp2_fragment_810:1368|pp2#1
stable integration of a construct targeting two genes
s:locus108:Zm_ubi1_promoter:CDS\fragment_703:969|GUS\nct#1
stable integration if promoter is unknown (WHICH SHOULD NOT HAPPEN!!!)
s:locus108:unknown:CDS\fragment_703:969|GUS\nct#1

Overexpression constructs

Stable overexpression constructs.

o:WHERE:WHAT#INDEX
#PMID:21350301
o:hb7:Os_act1_promoter::cyp78a27#4

Transient assays

Can be overexpresssion or silencing constructs. Use SO term engineered_plasmid in WHERE clause to indicate that the construct was/is not integrated into the host genome, but was provided as a plasmid.

o:WHERE:WHAT
o:engineered_plasmid:Zm_ubi1_promoter::abi3A

Functional complementation constructs

stable, targeted constructs
c:WHERE:PRODUCT|LOCUS#INDEX
#PMID:20663817:
c:endogenous_locus:At_ton1|ton1#1
#PMID:18298672:
c:locus108:Zm_ubi1_promoter::adf_fragment_wo_3UTR|adf#1
stable, untargeted constructs
transient constructs

Transient complementation constructs

c:WHERE:PRODUCT|LOCUS#INDEX --> stable construct
c:PRODUCT|LOCUS#INDEX --> transient construct
c:CaMV_35S_promoter::myoXIa_fragment_stop_codon|myoXIa#1

Mutagenized strains

Personal tools