Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
A theoretical and experimental study on the construction of suffix arrays in external memory
Autore:
Crauser, A; Ferragina, P;
Indirizzi:
Max Planck Inst Informat, Saarbrucken, Germany Max Planck Inst Informat Saarbrucken Germany rmat, Saarbrucken, Germany Univ Pisa, Dipartimento Informat, Pisa, Italy Univ Pisa Pisa ItalyUniv Pisa, Dipartimento Informat, Pisa, Italy
Titolo Testata:
ALGORITHMICA
fascicolo: 1, volume: 32, anno: 2002,
pagine: 1 - 35
SICI:
0178-4617(200201)32:1<1:ATAESO>2.0.ZU;2-Q
Fonte:
ISI
Lingua:
ENG
Soggetto:
ALGORITHM; TREES;
Keywords:
suffix array; large data sets; external-memory model; text indexing; full-text and word-based models;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Engineering, Computing & Technology
Citazioni:
33
Recensione:
Indirizzi per estratti:
Indirizzo: Crauser, A Max Planck Inst Informat, Saarbrucken, Germany Max Planck Inst Informat Saarbrucken Germany rucken, Germany
Citazione:
A. Crauser e P. Ferragina, "A theoretical and experimental study on the construction of suffix arrays in external memory", ALGORITHMIC, 32(1), 2002, pp. 1-35

Abstract

The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them arestate-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we pose particular attention to differentiate between "random' and "contiguous" disk accesses, in order to explain reasonably some practical I/O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in the light of othersimpler external-memory models. We also address two other issues. The former is concerned with the problem of building word indexes; we show that ourresults can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive "contradiction" between the effective practical performance of the well-known Baeza-Yates Gonnet-Snider algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memoryalgorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 20/01/20 alle ore 10:32:31