The assembly groups contamination analysis

During the first Jamboree [Contaminations] were an important topic, we said we will look after.

The analyses of the assembly group already indicated that there are contaminants from *Bacillus genera: Several prokaryotic contaminants were seen repeatedly. The most prevalent genera were (Brevi/Paeni/Geo)bacillus, Thermus, and Pseudomonas. These scaffolds were exlcuded from the main_genome and put into the prokaryotic bin. In this process they also found two peaks in the GC distribution among scaffolds:

Image:JGI GC distribution scaffolds.png

The filtered (I.e. excluding short and redundant, but organelle) scaffolds from the 11/15/2005 assembly were megablast’ed against the NCBI nt database, using the following command-line options: -p 90 -e 1e-10 -z 1e9 -F “m D” -b 100 -v 100 organelle scaffolds were identified by BLAT-aligning the filtered scaffolds to the draft mitochondrion sequence.

Despite the fact, that these results were used in the scaffold partitioning process, we think that some bacterial scaffolds still remained in the main_genome.

