Catalogo Articoli (Spogli Riviste)
OPAC HELP
Titolo: PoLAPACK: Parallel factorization routines with algorithmic blocking
Autore: Choi, JY;
 Indirizzi:
 Soongsil Univ, Sch Comp, Seoul 156743, South Korea Soongsil Univ Seoul South Korea 156743 h Comp, Seoul 156743, South Korea
 Titolo Testata:
 CONCURRENCY AND COMPUTATIONPRACTICE & EXPERIENCE
fascicolo: 12,
volume: 13,
anno: 2001,
pagine: 1033  1047
 SICI:
 15320626(200110)13:12<1033:PPFRWA>2.0.ZU;2O
 Fonte:
 ISI
 Lingua:
 ENG
 Soggetto:
 MATRIXMULTIPLICATION ALGORITHM; MEMORY CONCURRENT COMPUTERS;
 Tipo documento:
 Article
 Natura:
 Periodico
 Settore Disciplinare:
 Engineering, Computing & Technology
 Citazioni:
 20
 Recensione:
 Indirizzi per estratti:
 Indirizzo: Choi, JY Soongsil Univ, Sch Comp, 11 Sangdo Dong, Seoul 156743, South Korea Soongsil Univ 11 Sangdo Dong Seoul South Korea 156743 uth Korea



 Citazione:
 J.Y. Choi, "PoLAPACK: Parallel factorization routines with algorithmic blocking", CONCURR COM, 13(12), 2001, pp. 10331047
Abstract
LU, QR, and Cholesky factorizations are the most widely used methods for solving dense linear systems of equations, and have been extensively studiedand implemented on vector and parallel computers. Most of these factorization routines are implemented with blockpartitioned algorithms in order to perform matrixmatrix operations, that is, to obtain the highest performance by maximizing reuse of data in the upper levels of memory, such as cache. Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different fromone another in order to generate the maximum performance of an algorithm. Therefore, the data matrix should be distributed with the machine specific optimal block size before the computation. Too small or large a block size makes achieving good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution ofthe data matrix. In this paper, we present parallel LU, QR, and Cholesky factorization routines with an 'algorithmic blocking' on twodimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routinesare implemented on the Intel Paragon and the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines. Copyright (C) 2001 John Wiley & Sons, Ltd.
ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 19/01/20 alle ore 01:25:10