Filtered models

From PhyscomeProjectWiki

Jump to: navigation, search
related article
Masking_of_transposons

Filtered models V1.1 - FM3

Filtered models V1.0 - FM1

Poly-N transcripts in Filtered Models

--Rensing 14:54, 27 October 2006 (CEST)


mail excerpt from S. Rensing to JGI:

When using the Phypa1 FM1 CDS sequences for BLASTing, we noted that for a total of 42, calculation of Karlin-Altschul parameters failed. As this is usually due to sequence composition, I had a look at several of the sequences. As it turns out (see example pasted below), the transcripts consist entirely of Ns (and the corresponding protein of Xs). I am a bit worried as how such transcripts can come into being - is this to be expected? Apparently, all of the affected gene models are fgenesh1_pm predictions.

[blastall] WARNING: jgi|Phypa1|58445|fgenesh1_pm.scaffold_105000012: SetUpBlastSearch failed. [blastall] ERROR: jgi|Phypa1|58445|fgenesh1_pm.scaffold_105000012: BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence

>jgi|Phypa1|58445|fgenesh1_pm.scaffold_105000012
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNN
>jgi|Phypa1|58445|fgenesh1_pm.scaffold_105000012
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X

mail excerpt from T. Nishiyama to JGI:

I happened to find a gene model that is predicted on a gap. Then, I checked for transcripts that contain only Ns from transcripts.Phypa1_FM1.fasta.gz and found 41 transcripts, which are listed below. All of them are prediction by fgenesh. I am not sure why such transcripts are predicted, but I am afraid this might be an indication of some error in the process.

Do you have any idea what happened, whether other ab initio predictions are sound, and whether we should go on just discarding those transcripts from the gene catalog?

jgi|Phypa1|79055|fgenesh1_pg.scaffold_75000148
jgi|Phypa1|83333|fgenesh1_pg.scaffold_112000059
jgi|Phypa1|83248|fgenesh1_pg.scaffold_111000063
jgi|Phypa1|78705|fgenesh1_pg.scaffold_73000085
jgi|Phypa1|81925|fgenesh1_pg.scaffold_99000028
jgi|Phypa1|81926|fgenesh1_pg.scaffold_99000029
jgi|Phypa1|81946|fgenesh1_pg.scaffold_99000049
jgi|Phypa1|79220|fgenesh1_pg.scaffold_77000051
jgi|Phypa1|79263|fgenesh1_pg.scaffold_77000094
jgi|Phypa1|78405|fgenesh1_pg.scaffold_71000051
jgi|Phypa1|78406|fgenesh1_pg.scaffold_71000052
jgi|Phypa1|83055|fgenesh1_pg.scaffold_109000036
jgi|Phypa1|83059|fgenesh1_pg.scaffold_109000040
jgi|Phypa1|83060|fgenesh1_pg.scaffold_109000041
jgi|Phypa1|83061|fgenesh1_pg.scaffold_109000042
jgi|Phypa1|83062|fgenesh1_pg.scaffold_109000043
jgi|Phypa1|83063|fgenesh1_pg.scaffold_109000044
jgi|Phypa1|83064|fgenesh1_pg.scaffold_109000045
jgi|Phypa1|83065|fgenesh1_pg.scaffold_109000046
jgi|Phypa1|83086|fgenesh1_pg.scaffold_109000067
jgi|Phypa1|83165|fgenesh1_pg.scaffold_110000047
jgi|Phypa1|83166|fgenesh1_pg.scaffold_110000048
jgi|Phypa1|86894|fgenesh1_pg.scaffold_151000043
jgi|Phypa1|82997|fgenesh1_pg.scaffold_108000050
jgi|Phypa1|83000|fgenesh1_pg.scaffold_108000053
jgi|Phypa1|83002|fgenesh1_pg.scaffold_108000055
jgi|Phypa1|83638|fgenesh1_pg.scaffold_115000055
jgi|Phypa1|85033|fgenesh1_pg.scaffold_130000042
jgi|Phypa1|79907|fgenesh1_pg.scaffold_82000074
jgi|Phypa1|86712|fgenesh1_pg.scaffold_149000049
jgi|Phypa1|86737|fgenesh1_pg.scaffold_149000074
jgi|Phypa1|86738|fgenesh1_pg.scaffold_149000075
jgi|Phypa1|86739|fgenesh1_pg.scaffold_149000076
jgi|Phypa1|86766|fgenesh1_pg.scaffold_149000103
jgi|Phypa1|86767|fgenesh1_pg.scaffold_149000104
jgi|Phypa1|58445|fgenesh1_pm.scaffold_105000012
jgi|Phypa1|86638|fgenesh1_pg.scaffold_148000045
jgi|Phypa1|85246|fgenesh1_pg.scaffold_132000033
jgi|Phypa1|85257|fgenesh1_pg.scaffold_132000044
jgi|Phypa1|85262|fgenesh1_pg.scaffold_132000049
jgi|Phypa1|85268|fgenesh1_pg.scaffold_132000055

Answer by I. Grigoriev:

As was reported earlied by Stefan, a number of ab initio models have stretched of N or X characters. Astrid looked into this problem and found that ab initio gene models on a few scaffolds were corrupted. She is rebuilding these models and they will replace the corrupted gene models in GeneCatalog. We'll keep you posted on this update. Meanwhile public release of the Portal will be delated untill this problem is resolved. I do apologize for inconvenience.

Personal tools