Engineering Physics and Mathematics Division, Oak Ridge
National Laboratory, TN 37831-6364, USA.
A new version of the GRAIL system (Uberbacher and Mural, 1991;
Mural et al., 1992; Uberbacher et al., 1993), called GRAIL II, has
recently been developed (Xu et al., 1994). GRAIL II is a hybrid AI
system that supports a number of DNA sequence analysis tools
including protein-coding region recognition, PolyA site and
transcription promoter recognition, gene model construction,
translation to protein, and DNA/protein database searching
capabilities. This paper presents the core of GRAIL II, the coding
exon recognition and gene model construction algorithms. The exon
recognition algorithm recognizes coding exons by combining coding
feature analysis and edge signal (acceptor/donor/translation-start
sites) detection. Unlike the original GRAIL system (Uberbacher and
Mural, 1991; Mural et al., 1992), this algorithm uses
variable-length windows tailored to each potential exon candidate,
making its performance almost exon length-independent. In this
algorithm, the recognition process is divided into four steps.
Initially a large number of possible coding exon candidates are
generated. Then a rule-based prescreening algorithm eliminates the
majority of the improbable candidates. As the kernel of the
recognition algorithm, three neural networks are trained to
evaluate the remaining candidates. The outputs of the neural
networks are then divided into clusters of candidates,
corresponding to presumed exons. The algorithm makes its final
prediction by picking the best canadidate from each cluster. The
gene construction algorithm (Xu, Mural and Uberbacher, 1994) uses
a dynamic programming approach to build gene models by using as
input the clusters predicted by the exon recognition algorithm.
Extensive testing has been done on these two algorithms.
PMID: 7584416, UI: 96039043