Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
Improved spill code generation for software pipelined loops
Autore:
Zalamea, J; Llosa, J; Ayguade, E; Valero, M;
Indirizzi:
Univ Politecn Catalunya, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain Univ Politecn Catalunya Barcelona Spain ES-08034 -08034 Barcelona, Spain
Titolo Testata:
ACM SIGPLAN NOTICES
fascicolo: 5, volume: 35, anno: 2000,
pagine: 134 - 144
SICI:
1523-2867(200005)35:5<134:ISCGFS>2.0.ZU;2-B
Fonte:
ISI
Lingua:
ENG
Keywords:
instruction-level parallelism; software pipelining; register allocation; spill code;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Engineering, Computing & Technology
Citazioni:
29
Recensione:
Indirizzi per estratti:
Indirizzo: Zalamea, J Univ Politecn Catalunya, Dept Arquitectura Computadors, Cr Jordi Girona 1-3,Modul D6,Campus Nord, ES-08034 Barcelona, Spain Univ Politecn Catalunya Cr Jordi Girona 1-3,Modul D6,Campus Nord Barcelona Spain ES-08034
Citazione:
J. Zalamea et al., "Improved spill code generation for software pipelined loops", ACM SIGPL N, 35(5), 2000, pp. 134-144

Abstract

Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the Execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their execution. A schedule is valid if it requires at most the number of registers available in the target architecture. If not its register requirements have to be reduced either by decreasing the iteration overlapping or by spilling registers to memory. In this paper we describe a set of heuristics to increase the quality of register-constrained module schedules. The heuristics decide between the two previous alternatives and define criteria for effectively selecting spilling candidates. The heuristics proposed for reducing the register pressure can be applied to any software pipelining technique. The proposals are evaluated using a register-conscious software pipeliner on a workbench composed of a large set of loops from the Perfect Club benchmark and a set of processor configurations. Proposals in this paper are compared against a previous proposal already described in the literature. For one of these processor configurations and the setof loops that do not Bt in the available registers (32), a speed-up of 1.68 and a reduction of the memory traffic by a factor of 0.57 are achieved with an affordable increase in compilation time. For all the loops, this represents a speedup of 1.38 and a reduction of the memory traffic by a factor of 0.7.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 11/07/20 alle ore 20:43:26