In my opinion, we have collected ample evidence that a large fraction of the 8x main_genome_scaffolds represent a eubacterial contamination.

It would certainly be good to check some more examples from the cluster fringes in the wet lab.

In addition, we cannot yet rule out the possibility that the contaminant might be endophytic. It is not in our (Freiburg) WT strain of P. patens. However, maybe some PCRs should be repeated with the actual Gransden 2004 strain that went into sequencing. I would also suggest to try these PCRs on the cDNA libraries that were sequenced at the JGI.

In any case, I believe we should void the genome as presented in the genome browser from the bacterial cluster. Otherwise we might end up with data derived from contaminants being mentioned in the genome publication.

Your opinions, please!

--Rensing 09:55, 27 October 2006 (CEST)

Our current understanding is that cluster 2 represents a eubacterial contamination that was present in the DNA from which the sequencing libraries were created.

As we have detected only a single case of a chimeric scaffold, we don't believe that we have to deal with an assembly problem and thus can remove whole scaffolds that represent contaminations.

We agreed in a phone conference that we will provide a list of contaminated scaffolds to be excluded from the main_genome track prior to release of the v1 genome.

Cluster 1 represents real P.p. genomic regions that contain protein-coding genes. Cluster 2 represents the contamination mentioned above.

Problematic are clusters 3a/b, which contain P.p. genomic DNA scaffolds that lack normal protein-coding genes in most cases, but contain RNA genes and transposable elements. Some of these scaffolds most certainly are from Eubacteria that have been sequenced by JGI at the same time P.p. was sequenced. Apparently, the ORFs of these organisms cannot accurately be predicted using a Bacillus-model. We want to detect all those scaffolds using a megaBLAST approach, for which we need the help of the JGI (as not all data are yet publicly availab le).

--Rensing 09:37, 24 November 2006 (CET)

