Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures
Autore:
Lopez, D; Llosa, J; Valero, M; Ayguade, E;
Indirizzi:
Tech Univ Catalunya UPC, Comp Architecture Dept, Barcelona, Spain Tech Univ Catalunya UPC Barcelona Spain itecture Dept, Barcelona, Spain
Titolo Testata:
IEEE TRANSACTIONS ON COMPUTERS
fascicolo: 10, volume: 50, anno: 2001,
pagine: 1033 - 1051
SICI:
0018-9340(200110)50:10<1033:CSTIPO>2.0.ZU;2-G
Fonte:
ISI
Lingua:
ENG
Soggetto:
REGISTER PRESSURE; MICROPROCESSOR;
Keywords:
VLIW processors; instruction level parallelism; software pipelining; numerical applications; performance/cost trade-off;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Engineering, Computing & Technology
Citazioni:
38
Recensione:
Indirizzi per estratti:
Indirizzo: Lopez, D Tech Univ Catalunya UPC, Comp Architecture Dept, Campus Nord,Modul D6,Jordi Girona 1-3, Barcelona, Spain Tech Univ Catalunya UPC Campus Nord,Modul D6,Jordi Girona 1-3 Barcelona Spain
Citazione:
D. Lopez et al., "Cost-conscious strategies to increase performance of numerical programs onaggressive VLIW architectures", IEEE COMPUT, 50(10), 2001, pp. 1033-1051

Abstract

Loops are the main time-consuming part of numerical applications. The performance of the loops is limited either by the resources offered by the architecture or by recurrences in the computation. To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units. However, the high cost in terms of area and cycle time of this technique precludes the use of high degrees of replication. High values for the cycle time may clearly offset any gain In terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alternative to resource replication is resource widening (widening technique), which has also been used in some recent designs in which the width of the resources is increased (i.e., a single operation is performed over multiple data). Moreover, several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating-point units (fusion technique), which reduces the latency of the combined operation and the number ofresources used. In this paper, we evaluate a broad set of VLIW processor design alternatives that combine the three techniques. We perform a technological projection for the next processor generations in order to foresee thepossible implementable alternatives. From this study, we conclude that if the cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only replication. Also, we confirm that multiply-add fused units will have a significant impact in raising the performance of future processors architectures with a reasonable increase in cost.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 13/07/20 alle ore 03:10:29