Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
Listening to two simultaneous speeches
Autore:
Okuno, HG; Nakatani, T; Kawabata, T;
Indirizzi:
Nipponanelegraph & Tel Publ Corp, NTT Basic Res Labs, Kanagawa 2430198, Jap Nippon Telegraph & Tel Publ Corp Kanagawa Japan 2430198 awa 2430198, Jap Japan Sci & Technol Corp, ERATO, Kitano Symbiot Syst Project, Tokyo 150001, Japan Sci & Technol Corp Tokyo Japan 150001 Syst Project, Tokyo 150001, NTT, Multimedia Business Dev Ctr, Tokyo, Japan NTT Tokyo JapanNTT, Multimedia Business Dev Ctr, Tokyo, Japan NTT, Cyber Sci Labs, Kanagawa, Japan NTT Kanagawa JapanNTT, Cyber Sci Labs, Kanagawa, Japan
Titolo Testata:
SPEECH COMMUNICATION
fascicolo: 3-4, volume: 27, anno: 1999,
pagine: 299 - 310
SICI:
0167-6393(199904)27:3-4<299:LTTSS>2.0.ZU;2-I
Fonte:
ISI
Lingua:
ENG
Keywords:
speech stream segregation; simultaneous speakers; auditory scene analysis;
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Engineering, Computing & Technology
Citazioni:
38
Recensione:
Indirizzi per estratti:
Indirizzo: Okuno, HG Nipponya,legraph & Tel Publ Corp, NTT Basic Res Labs, 3-1 Morinosato Wakami Nippon Telegraph & Tel Publ Corp 3-1 Morinosato Wakamiya Kanagawa Japan 2430198
Citazione:
H.G. Okuno et al., "Listening to two simultaneous speeches", SPEECH COMM, 27(3-4), 1999, pp. 299-310

Abstract

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively. (C) 1999 Elsevier Science B.V. All rights reserved.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 21/09/20 alle ore 07:06:25