Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
Variations on probabilistic suffix trees: statistical modeling and prediction of protein families
Autore:
Bejerano, G; Yona, G;
Indirizzi:
Hebrew Univ Jerusalem, Sch Engn & Comp Sci, IL-91904 Jerusalem, Israel Hebrew Univ Jerusalem Jerusalem Israel IL-91904 -91904 Jerusalem, Israel Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA Stanford Univ Stanford CA USA 94305 t Biol Struct, Stanford, CA 94305 USA
Titolo Testata:
BIOINFORMATICS
fascicolo: 1, volume: 17, anno: 2001,
pagine: 23 - 43
SICI:
1367-4803(200101)17:1<23:VOPSTS>2.0.ZU;2-Z
Fonte:
ISI
Lingua:
ENG
Soggetto:
ACID SUBSTITUTION MATRICES; DISTANTLY RELATED PROTEINS; CIRCULAR PERMUTATION; SEQUENCE MOTIFS; DATABASE; AUTOMATA; PATTERNS; CLASSIFICATION; INFORMATION; ALIGNMENTS;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Life Sciences
Citazioni:
46
Recensione:
Indirizzi per estratti:
Indirizzo: Yona, G Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA Cornell Univ Ithaca NY USA 14853 t Comp Sci, Ithaca, NY 14853 USA
Citazione:
G. Bejerano e G. Yona, "Variations on probabilistic suffix trees: statistical modeling and prediction of protein families", BIOINFORMAT, 17(1), 2001, pp. 23-43

Abstract

Motivation: We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, noris delineation of domain boundaries required. The method is automatic, andcan be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can beincorporated to improve performance. Results: The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more relatedsequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 27/09/20 alle ore 17:57:18