Masking of transposons

From PhyscomeProjectWiki

Jump to: navigation, search


Genes encoded by transposons are present in the Filtered Models

--Rensing 14:55, 27 October 2006 (CEST)


Mail excerpt from S. Rensing to JGI:

As far as I know, Harris used the LTR retrotransposon sequences that we sent to mask the traces prior to assembly. The sequences are also present in the genome browser as repeat track.

If you look at some models, e.g.

http://shake.jgi-psf.org/cgi-bin/browserLoad?db=Phypa1&position=scaffold_1:2096421-2104439
http://shake.jgi-psf.org/cgi-bin/browserLoad?db=Phypa1&position=scaffold_1:2060395-2087403

you will see that these models overlap with predicted LTR retrotransposons. Not surprisingly, these models are part of huge clusters if you do paralog detection.

So my question is, how is your experience with this from other genomes? Would it make sense to discard such models, to mask these regions prior to annotation or to flag a warning?


Answer by H. Shapiro:

To answer the initial part of the below question: we did not end up masking out the repeat sequences prior to assembly. It is an open question as to whether it is better to screen such sequences out before or after assembly. The main advantage would be that the assembler might be less confused if the repeats were excluded, and so not mis-join some regions. The main disadvantage would be whether lots of small gaps would be added to the assembly by excluding small repeat regions that the assembler would be able to work through. My suspicion is that the "correct" answer will vary by genome.

Unfortunately, given the schedule constraints we were operating under at the time, we weren't going to be able to re-run the assembly if the masking turned out to cause problems. Since we don't have enough experience with such pre-screening to guess whether it would be a net positive or negative, we opted for the strategy of not pre-screening.

Personal tools