Catalogo Articoli (Spogli Riviste)
OPAC HELP
Titolo: Accelerating EM for large databases
Autore: Thiesson, B; Meek, C; Heckerman, D;
 Indirizzi:
 Microsoft Corp, Res, Redmond, WA 98052 USA Microsoft Corp Redmond WA USA 98052 soft Corp, Res, Redmond, WA 98052 USA
 Titolo Testata:
 MACHINE LEARNING
fascicolo: 3,
volume: 45,
anno: 2001,
pagine: 279  299
 SICI:
 08856125(2001)45:3<279:AEFLD>2.0.ZU;2X
 Fonte:
 ISI
 Lingua:
 ENG
 Soggetto:
 LIKELIHOODESTIMATION; MAXIMUMLIKELIHOOD; ALGORITHM;
 Keywords:
 Expectation Maximization algorithm; incremental EM; lazy EM; online EM; data blocking; mixture models; clustering;
 Tipo documento:
 Article
 Natura:
 Periodico
 Settore Disciplinare:
 Engineering, Computing & Technology
 Citazioni:
 22
 Recensione:
 Indirizzi per estratti:
 Indirizzo: Thiesson, B Microsoft Corp, Res, 1 Microsoft Way, Redmond, WA 98052 USA Microsoft Corp 1 Microsoft Way Redmond WA USA 98052 98052 USA



 Citazione:
 B. Thiesson et al., "Accelerating EM for large databases", MACH LEARN, 45(3), 2001, pp. 279299
Abstract
The EM algorithm is a popular method for parameter estimation in a varietyof problems involving missing data. However, the EM algorithm often requires significant computational resources and has been dismissed as impractical for large databases. We present two approaches that significantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial Esteps for which we can use the results of Neal and Hinton (In Jordan, M. (Ed.), Learning in Graphical Models, pp. 355371. The Netherlands: Kluwer Academic Publishers) to obtain the standard convergence guarantees of EM. The first approach is a version of the incremental EM algorithm, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically effects the efficiency of the algorithm. We provide a method for selecting a nearoptimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the significance of each data case and then proceed for several iterations actively using only the significant cases. Wedemonstrate that both methods can significantly reduce computational coststhrough their application to highdimensional realworld and synthetic mixture modeling problems for large databases.
ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 06/04/20 alle ore 01:43:47