Contaminations/Wet lab proof

From PhyscomeProjectWiki

Jump to: navigation, search

Contents

Primer design

--Lang 10:51, 31 October 2006 (CET)

Primers are designed with eprime3 and cross-checked by BLASTN vs 8x main_genome, e.g:

eprimer3 -task 1 -mingc 45 -minsize 20 -maxsize 20 -productsizerange 300-500 -includedregion 83-1087 scaffold_9342.fas

The primer details contain the following fields:

  1. Start
  2. Len
  3. Tm
  4. GC%
  5. Sequence

Used length standard

Image:1kb DNA ladder.png

Arginyl-tRNA synthetase

Grey area cluster3a scaffold_9342

Primers

PRODUCT SIZE: 303
FORWARD PRIMER     250   20  60.86  60.00  GGCCGGACTTCTAGAGGAGA
REVERSE PRIMER     533   20  59.94  45.00  GACATTTTCTTGCCGTCCAT

Bacterial cluster2 scaffold_451

Primers

PRODUCT SIZE: 301
FORWARD PRIMER  125059   20  59.85  50.00  GCCGTCAGCATTTTAGGAAG
REVERSE PRIMER  125340   20  59.96  50.00  AGGCATCGTACATCGTTTCC

Bacterial cluster2 scaffold_637

Primers

PRODUCT SIZE: 301
FORWARD PRIMER   48502   20  59.75  55.00  AGCCTGGACGAGTACGACAT
REVERSE PRIMER   48783   20  60.13  50.00  AGAGCACTTCCTCCAGCAAA

True Physco cluster1 scaffold_39

Primers

Primers were searched for the region spanning the first two exons of the gene (according to the TBLASTN result (455931-456885)). The primer selected is derived from the first exon.

PRODUCT SIZE: 304
FORWARD PRIMER  456015   20  60.68  50.00  ATGGGGATCTGGTCGAAGAT
REVERSE PRIMER  456299   20  59.51  55.00  CCTTCTCAATCCACCTGTCC

PCR results for the 4 ArgS amplicons

Pierre-Fran├žois Perroud from Ralph Quatrano's group performed the PCR analysis on Gransden (WT) and Villersexel (VX) genomic DNA. The putative contamination scaffolds (S9342,S451 and S637) aren't showing any signal neither on WT nor on VX, but the S39, containing the predicted eukaryotic Arginyl-tRNA synthetase gene, is amplified both in WT and VX. From this data it is clear that these contigs are contaminants!


Pierre-Fran├žois Perroud did repeat the analysis, but included this time template DNA from the gDNA that was send to JGI for sequencing.

This result clearly shows, that there has been a contamination in the gDNA send for sequencing!

Extreme samples of the 4 clusters

--Lang 12:04, 15 November 2006 (CET)

As indicated with the small black arrows in figure Image:All_data_k4_zoom.annotated.png, we will have a look at 11 scaffolds which are localized on the borders of the 4 clusters.

If possible, the primers are predicted for regions where a we could predict an ORF using FrameD.

Because of the difficulties in designing primers for TE related scaffolds from cluster1 (see below), I tried several published methods for unique genomic primer design and decided for GENOMEMASKER package for designing unique genomic PCR primers BMC Bioinformatics 2006, 7:172, which allows the masking of overrepresented words prior to primer design with a modified version primer3.

Parameters for GenomeMasker

For the overrepresented words a word size of 16 is assumed. The parameters for primer3 are as follows:

PRIMER_PRODUCT_SIZE_RANGE=300-500
PRIMER_OPT_SIZE=20
PRIMER_MIN_SIZE=20
PRIMER_MAX_SIZE=30
PRIMER_OPT_TM=60
PRIMER_MIN_TM=59
PRIMER_PRODUCT_OPT_SIZE=300
PRIMER_OPT_GC_PERCENT=45
PRIMER_MAX_TM=61
PRIMER_MIN_GC=20
PRIMER_MAX_GC=70
PRIMER_FILE_FLAG=0
PRIMER_EXPLAIN_FLAG=1
PRIMER_NUM_RETURN=1
TARGET=x,y

As indicated by the last line, as a starting point, the region of the annoated FrameD ORF was used. In cases were there either was none, or GenomeMasker could not advise a amplicon with this region, the search was extended to the whole (masked) scaffold.

Cluster1 - True Physco scaffolds

Cluster 1: scaffold_1134

Deriving primers for the FrameD ORF region (scaffold_1134_orf_4 10946 11677) is problematic, because its part of a transposable element with multiple occurences in the genome.

Image:Scaffold 1134.overview.png

After checking with several filtering strategies (also outside of the ORF region) it became quite obvious, that the whole scaffold is of TE origin (dispersed between 5 larger gaps), and thus no unique amplicon could be found. E.g. using top5 primer pairs for the ORF region above, 93 distinct loci of cluster1 (88) and cluster3b (5) scaffolds are matched, but none of the primer pairs tested throughout the analysis, could be mapped to a cluster2 scaffold.

Primers

PRIMER_SEQUENCE_ID=scaffold_1134
TARGET=10946,211 
PRIMER_LEFT_SEQUENCE=GGAGTTGCTTCCAATTGTACAAGCCCA
PRIMER_RIGHT_SEQUENCE=TGCATATGCGATGAGTGTTGTGAGTCAA
PRIMER_LEFT=10895,27
PRIMER_RIGHT=11266,28
PRIMER_LEFT_TM=59.698
PRIMER_RIGHT_TM=59.165
PRIMER_LEFT_GC_PERCENT=48.148
PRIMER_RIGHT_GC_PERCENT=42.857
PRIMER_PRODUCT_SIZE=372
loci as prediced by GenomeMasker
As mentioned above, the scaffold itself includes a single or multiple transposable elements and therefore we cannot derive a unique amplicon for it. For the chosen primer pair GenomeMasker predicts 2 sense and 1 anti-sense strand product. None of them is on a scaffold from the bacterial cluster.

PCR Result

Image:Wet lab confirmation.cluster1.scaffold 1134.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The result supports the clustering hypothesis, i.e. the TE-containing scaffold_1134 truely is part of the Physcomitrella genome.

Cluster 1: scaffold_1657

Image:Scaffold 1657.overview.png

Like scaffold_1134, this is scaffold has a large gap region (red) and an annotated, fragmentary ORF for a copia-type polymerase.

Primers

PRIMER_SEQUENCE_ID=scaffold_1657
TARGET=11329,110 
PRIMER_LEFT_SEQUENCE=ACAAGGCAGCAGTAGGCTCACTCA
PRIMER_RIGHT_SEQUENCE=CCGCCTCCATTGTGGACCTTGC
PRIMER_LEFT=11281,24
PRIMER_RIGHT=11616,22
PRIMER_LEFT_TM=59.878
PRIMER_RIGHT_TM=60.304
PRIMER_LEFT_GC_PERCENT=54.167
PRIMER_RIGHT_GC_PERCENT=63.636
PRIMER_PRODUCT_SIZE=336
loci as prediced by GenomeMasker
The product mapping procedure shows 100 loci( 51 sense, 48 antisense and 1 PrimerB-PrimerB product), but none of them on a cluster2 scaffold (mainly cluster1, but also 3a and 3b).

PCR Result

Image:Wet lab confirmation.cluster1.scaffold 1657.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The result supports the clustering hypothesis, i.e. the TE-containing scaffold_1657 truely is part of the Physcomitrella genome. The PrimerB-PrimerB-product predicted by GenomeMasker seems to show a length polymorphism between Vx and Gransden. As already feared from the amplicon mapping procedure, the PCR does yield multiple smearing bands.

Cluster1: scaffold_8782

Image:Scaffold 8782.overview.png

Also a TE-containing scaffold (scaffold_8782_orf_1 626 1096 471), but small and without gap regions.

Primers

PRIMER_SEQUENCE_ID=scaffold_8782
TARGET=
PRIMER_LEFT_SEQUENCE=ATCGCGCGCTCGTGGAGAAG
PRIMER_RIGHT_SEQUENCE=CTGATCAGCGCCTCCGCCTG
PRIMER_LEFT=328,20
PRIMER_RIGHT=713,20
PRIMER_LEFT_TM=59.912
PRIMER_RIGHT_TM=59.908
PRIMER_LEFT_GC_PERCENT=65.000
PRIMER_RIGHT_GC_PERCENT=70.000
PRIMER_PRODUCT_SIZE=386
loci as prediced by GenomeMasker
This primer has a unique locus.

PCR Result

Image:Wet lab confirmation.cluster1.scaffold 8782.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel. Scaffolds 92,290 and 8782 were run on the same gel. For a better overview, the above picture was cut from the original and is missing the marker lane.

The result supports the clustering hypothesis, i.e. the TE-containing scaffold_8782 truely is part of the Physcomitrella genome. Interestingly, it yields a smaller secondary band in the contaminated gDNA used for sequencing.

Intron-less gene from cluster1: scaffold_290:288828-293077

Primers

PRODUCT SIZE: 301
FORWARD PRIMER  290600   20  59.97  50.00  ATCTGGTTTTGGTGCCTGAC
REVERSE PRIMER  290881   20  59.96  55.00  GATCGGCTACCACCATCTGT
loci as prediced by GenomeMasker
This primer has a unique locus.

PCR Result

Image:Wet lab confirmation.cluster1.scaffold 290.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel. Scaffolds 92,290 and 8782 were run on the same gel. For a better overview, the above picture was cut from the original and is missing the marker lane.

The result supports the clustering hypothesis, i.e. scaffold_290 containing an intron-less gene truely is part of the Physcomitrella genome.

Intron-less gene from cluster1: scaffold_92:138677-141901

Primers

PRODUCT SIZE: 300
FORWARD PRIMER  139078   20  60.05  55.00  TGTTCCTCTCCAGGACCATC
REVERSE PRIMER  139358   20  60.05  50.00  TTAACTTCGTGGCTGCTGTG
loci as prediced by GenomeMasker
This primer has a unique locus.

PCR Result

Image:Wet lab confirmation.cluster1.scaffold 92.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The result supports the clustering hypothesis, i.e. scaffold_92 containing an intron-less gene truely is part of the Physcomitrella genome.

Cluster2 - Bacterial scaffolds

Cluster2: scaffold_11072

Primers

PRIMER_SEQUENCE_ID=scaffold_11072
TARGET=
PRIMER_LEFT_SEQUENCE=ACCGGTCACCGCTTCCGAGA
PRIMER_RIGHT_SEQUENCE=TGAACACGCCAGCCAGCGAC
PRIMER_LEFT=148,20
PRIMER_RIGHT=623,20
PRIMER_LEFT_TM=59.901
PRIMER_RIGHT_TM=60.248
PRIMER_LEFT_GC_PERCENT=65.000
PRIMER_RIGHT_GC_PERCENT=65.000
PRIMER_PRODUCT_SIZE=476
loci as prediced by GenomeMasker
The primer amplifies 2 loci on scaffold_11072 (476bp and 170bp long).

PCR result

Image:Wet lab confirmation.cluster2.scaffold 11072.png

L: left primer for the given scaffold; R; right primer for the given Scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

Suprisingly, the potential contaminant region can be amplified from both a and c. Taken together with the findings from scafffold_3591 and scaffold_5765, Pierre-Francois could trace back a contamination in their Villersexel strain last summer, which is still persisting (cefotaxim treated). In the light of these findings, the result indicates, that scaffold_11072 is part of the contamination which was introduced with the gDNA used for sequencing.

Cluster2: scaffold_4354

Primers

PRIMER_SEQUENCE_ID=scaffold_4354
TARGET=
PRIMER_LEFT_SEQUENCE=CCGCACCAGGCGGTTAAGCA
PRIMER_RIGHT_SEQUENCE=AGTTACGTGGGCGTGGCGTG
PRIMER_LEFT=46,20
PRIMER_RIGHT=404,20
PRIMER_LEFT_TM=59.973
PRIMER_RIGHT_TM=59.976
PRIMER_LEFT_GC_PERCENT=65.000
PRIMER_RIGHT_GC_PERCENT=65.000
PRIMER_PRODUCT_SIZE=359
loci as prediced by GenomeMasker
1 unique locus (sense orientation)

PCR result

Image:Wet lab confirmation.cluster2.scaffold 4354.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The PCR is inconclusive. Two possible explanations:

  1. The primer does not work/PCR reaction problem.
  2. The scaffold is part of an additional contamination (see Contaminations/What_is_the_closest_sequenced_taxon) possibly introduced in the JGI production pipeline (plate switch)

Here, we should design additional primers and check again.

Cluster2: scaffold_5427

Primers

PRIMER_SEQUENCE_ID=scaffold_5427
TARGET=
PRIMER_LEFT_SEQUENCE=CCTCCAGCGATAAACCCACCTGC
PRIMER_RIGHT_SEQUENCE=GGTACTCGCCAGGCGTCGTG
PRIMER_LEFT=105,23
PRIMER_RIGHT=539,20
PRIMER_LEFT_TM=59.568
PRIMER_RIGHT_TM=59.844
PRIMER_LEFT_GC_PERCENT=60.870
PRIMER_RIGHT_GC_PERCENT=70.000
PRIMER_PRODUCT_SIZE=435
loci as prediced by GenomeMasker
1 unique locus (sense orientation)

PCR result

Image:Wet lab confirmation.cluster2.scaffold 5427.png

L: left primer for the given scaffold; R; right primer for the given Scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The results supports the clustering hypothesis, i.e. scaffold_5427 is part of the contamination present in the gDNA used for seqencing.

Cluster2: scaffold_6805

Primers

PRIMER_SEQUENCE_ID=scaffold_6805
TARGET=
PRIMER_LEFT_SEQUENCE=CGGAGCGCGCGGTAGATCAG
PRIMER_RIGHT_SEQUENCE=CACGACCGCCTGGTTGTCCC
PRIMER_LEFT=811,20
PRIMER_RIGHT=1185,20
PRIMER_LEFT_TM=60.047
PRIMER_RIGHT_TM=59.975
PRIMER_LEFT_GC_PERCENT=70.000
PRIMER_RIGHT_GC_PERCENT=70.000
PRIMER_PRODUCT_SIZE=375
loci as prediced by GenomeMasker
1 unique locus (sense orientation)

PCR result

Image:Wet lab confirmation.cluster2.scaffold 6805.png

L: left primer for the given scaffold; R; right primer for the given Scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The results supports the clustering hypothesis, i.e. scaffold_6805 is part of the contamination present in the gDNA used for seqencing.


Cluster2: scaffold_5765

Primers

PRIMER_SEQUENCE_ID=scaffold_5765
TARGET=
PRIMER_LEFT_SEQUENCE=CCGGCAGGAGCCGGTCAAAG
PRIMER_RIGHT_SEQUENCE=TCCTCCAGCCTTCCCTCGGC
PRIMER_LEFT=76,20
PRIMER_RIGHT=405,20
PRIMER_LEFT_TM=60.043
PRIMER_RIGHT_TM=59.967
PRIMER_LEFT_GC_PERCENT=70.000
PRIMER_RIGHT_GC_PERCENT=70.000
PRIMER_PRODUCT_SIZE=330
loci as prediced by GenomeMasker
1 unique locus (sense orientation)

PCR result

Image:Wet lab confirmation.cluster2.scaffold 5765.png

L: left primer for the given scaffold; R; right primer for the given Scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

Suprisingly, the potential contaminant region can be amplified from both a and c. Taken together with the findings from scaffold_11072 and scaffold_3591, Pierre-Francois could trace back a contamination in their Villersexel strain last summer, which is still persisting (cefotaxim treated). Thus, this results also concurs with the clustering hypothesis and scaffold_5765 is a true member of cluster2.

Cluster3a - Grey area with some EST evidence scaffolds

Cluster3a: scaffold_449

Primers

PRIMER_SEQUENCE_ID=scaffold_449
TARGET=
PRIMER_LEFT_SEQUENCE=TGCCCTTCCTCCTCCCTCGC
PRIMER_RIGHT_SEQUENCE=TTCTGGGCAAGACGGCTGCG
PRIMER_LEFT=149750,20
PRIMER_RIGHT=150187,20
PRIMER_LEFT_TM=59.967
PRIMER_RIGHT_TM=59.974
PRIMER_LEFT_GC_PERCENT=70.000
PRIMER_RIGHT_GC_PERCENT=65.000
PRIMER_PRODUCT_SIZE=438
loci as predicted by GenomeMasker
1 unique locus

PCR Result

Image:Wet lab confirmation.cluster3a.scaffold 449.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

The PCR result shows, that scaffold_449 is part of the Physcomitrella genome and suggests the scaffold represents the (hopefully large) fraction of cluster3a that are part of the genome.

Cluster3b - Grey area without EST evidence scaffolds

Cluster 2 or 3b: scaffold_3591

Primers

PRIMER_SEQUENCE_ID=scaffold_3591
TARGET=
PRIMER_LEFT_SEQUENCE=AGTATGCCCGGACGCCTGGT
PRIMER_RIGHT_SEQUENCE=GCTGCCGCTCGCAGCATAGA
PRIMER_LEFT=7834,20
PRIMER_RIGHT=8326,20
PRIMER_LEFT_TM=59.967
PRIMER_RIGHT_TM=59.908
PRIMER_LEFT_GC_PERCENT=65.000
PRIMER_RIGHT_GC_PERCENT=65.000
PRIMER_PRODUCT_SIZE=493
loci as predicted by GenomeMasker
1 unique locus

PCR result

Image:Wet lab confirmation.cluster3b.scaffold 3591.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

Suprisingly, the potential contaminant region can be amplified from both a and c. Taken together with the findings from scaffold_11072 and scaffold_5765, Pierre-Francois could trace back a contamination in their Villersexel strain last summer, which is still persisting (cefotaxim treated). In the light of these findings, the result indicates, that scaffold_3591 is part of the contamination which was introduced with the gDNA used for sequencing. It also hardens the assumption, that cluster3b does contain genomic regions belonging in cluster2, whose ORFs yielded hits slightly below the filtering threshold used for the BLASTp vs genpept. Another clue in the direction that there could be more than one organism present in cluster2.

Cluster 3b: scaffold_4415

Primers

PRIMER_SEQUENCE_ID=scaffold_4415
TARGET=
PRIMER_LEFT_SEQUENCE=CGCGACTCCTGCCACCTTGG
PRIMER_RIGHT_SEQUENCE=GGTTCCGCAGGGGTTGACGG
PRIMER_LEFT=218,20
PRIMER_RIGHT=595,20
PRIMER_LEFT_TM=60.044
PRIMER_RIGHT_TM=59.973
PRIMER_LEFT_GC_PERCENT=70.000
PRIMER_RIGHT_GC_PERCENT=70.000
PRIMER_PRODUCT_SIZE=378
loci as predicted by GenomeMasker
1 unique locus

PCR result

Image:Wet lab confirmation.cluster3b.scaffold 4415.png

L: left primer for the given scaffold; R; right primer for the given scaffold; a: fresh Gransden gDNA; b: gDNA used to build sequencing libraries; c: fresh Villersexel gDNA.

Template for all the single primer amplifications is identical to the next lane a, b or c on any given gel.

Like the PCR for scaffold_3591, the result also supports the assumption, that cluster3b does contain genomic regions belonging in cluster2, whose ORFs yielded hits slightly below the filtering threshold used for the BLASTp vs genpept.

Personal tools