Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics
Autore:
May, ACW;
Indirizzi:
Natl Inst Med Res, Div Math Biol, London NW7 1AA, England Natl Inst Med Res London England NW7 1AA h Biol, London NW7 1AA, England
Titolo Testata:
PROTEIN ENGINEERING
fascicolo: 4, volume: 14, anno: 2001,
pagine: 209 - 217
SICI:
0269-2139(200104)14:4<209:OCOPSA>2.0.ZU;2-4
Fonte:
ISI
Lingua:
ENG
Soggetto:
HIERARCHICAL-CLASSIFICATION; FOLDING NUCLEUS; RESIDUES; DATABASE; WEIGHTS; COMMON; IDENTIFICATION; EXPRESSION; EVOLUTION; PATTERNS;
Keywords:
cluster analysis; multiple sequence alignment; protein homologous family; representative set; sequence weighting;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Life Sciences
Citazioni:
51
Recensione:
Indirizzi per estratti:
Indirizzo: May, ACW Natl Inst Med Res, Div Math Biol, Mill Hill, London NW7 1AA, England Natl Inst Med Res Mill Hill London England NW7 1AA 1AA, England
Citazione:
A.C.W. May, "Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics", PROTEIN ENG, 14(4), 2001, pp. 209-217

Abstract

Hierarchical classification is probably the most popular approach to group-related proteins. However, there are a number of problems associated with its use for this purpose. One is that the resulting tree showing a nested sequence of groups may not be the most suitable representation of the data. Another is that visual inspection is the most common method to decide the most appropriate number of subsets from a tree, In fact, classification of proteins in general is bedeviIled with the need for subjective thresholds todefine group membership (e.g. 'significant' sequence identity for homologous families). Such arbitrariness is not only intellectually unsatisfying but also has important practical consequences, For instance, it hinders meaningful identification of protein targets for structural genomics, I describean alternative approach to cluster-related proteins without the need for an a priori threshold, first, through its use of dynamic programming, which is guaranteed to produce globally optimal solutions at all levels of partition granularity. Grouping proteins according to weights assigned to their aligned sequences makes it possible to delineate dynamically a 'core-periphery' structure within families. The 'core' of a protein family comprises themost typical sequences while the 'periphery' consists of the atypical ones. Further, a new sequence weighting scheme that combines the information inall the multiply aligned positions of an alignment in a novel way is put forward. Instead of averaging over all positions, this procedure takes into account directly the distribution of sequence variability along an alignment. The relationships between sequence weights and sequence identity are investigated for 168 families taken from HOMSTRAD, a database of protein structure alignments for homologous families, An exact solution is presented forthe problem of how to select the most representative pair of sequences fora protein family. Extension of this approach by a greedy algorithm allows automatic identification of a minimal set of aligned sequences. The resultsof this analysis are available on the Web at http://mathbio.nimr.mrc.ac.uk/similar to amay.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 25/01/20 alle ore 03:26:33