Catalogo Articoli (Spogli Riviste)


Clustering protein sequences-structure prediction by transitive homology
Bolten, E; Schliep, A; Schneckener, S; Schomburg, D; Schrader, R;
Univ Cologne, ZPR, ZAIK, D-50937 Cologne, Germany Univ Cologne Cologne Germany D-50937 ZPR, ZAIK, D-50937 Cologne, Germany Univ Cologne, Inst Biochem, D-50937 Cologne, Germany Univ Cologne Cologne Germany D-50937 t Biochem, D-50937 Cologne, Germany Sci Factory, Cologne, Germany Sci Factory Cologne GermanySci Factory, Cologne, Germany
Titolo Testata:
fascicolo: 10, volume: 17, anno: 2001,
pagine: 935 - 941
Tipo documento:
Settore Disciplinare:
Life Sciences
Indirizzi per estratti:
Indirizzo: Schliep, A Univ Cologne, ZPR, ZAIK, Weyertal 80, D-50937 Cologne, Germany Univ Cologne Weyertal 80 Cologne Germany D-50937 ogne, Germany
E. Bolten et al., "Clustering protein sequences-structure prediction by transitive homology", BIOINFORMAT, 17(10), 2001, pp. 935-941


Motivation: It is widely believed that for two proteins A and B a sequenceidentity above some threshold implies structural similarity due to a common evolutionary ancestor. Since this is only a sufficient, but not a necessary condition for structural similarity, the question remains what other criteria can be used to identify remote homologues. Transitivity refers to the concept of deducing a structural similarity between proteins A and C from the existence of a third protein B, such that A and B as well as B and C are homologues, as ascertained if the sequence identity between A and B as well as that between B and C is above the aforementioned threshold. It is not fully understood if transitivity always holds and whether transitivity can be extended ad infinitum. Results: We developed a graph-based clustering approach, where transitivity plays a crucial role. We determined all pair-wise similarities for the sequences in the SwissProt database using the Smith-Waterman local alignment algorithm. This data was transformed into a directed graph, where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity scaled with respect to the self-similarity of A, above a fixed threshold. Transitivity was importantin the clustering process, as intermediate sequences were used, limited though by the requirement of having directed paths in both directions betweenproteins linked over such sequences. The length dependency-implied by the self-similarity-of the scaling of the alignment scores appears to be an effective criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed an efficient library. Methods include the novel graph-based clustering algorithm capable ofhandling multi-domain proteins and cluster comparison algorithms. Structural Classification of Proteins (SCOP) was used as an evaluation data set forour method, yielding a 24% improvement over pair-wise comparisons in termsof detecting remote homologues.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 24/01/21 alle ore 10:23:11