Catalogo Articoli (Spogli Riviste)

OPAC HELP

Titolo:
PHYSIOLOGICALLY-BASED SPEECH SYNTHESIS USING NEURAL NETWORKS
Autore:
HIRAYAMA M; VATIKIOTISBATESON E; KAWATO M;
Indirizzi:
ATR HUMAN INFORMAT PROC RES LABS KYOTO 61902 JAPAN
Titolo Testata:
IEICE transactions on fundamentals of electronics, communications and computer science
fascicolo: 11, volume: E76A, anno: 1993,
pagine: 1898 - 1910
SICI:
0916-8508(1993)E76A:11<1898:PSSUNN>2.0.ZU;2-6
Fonte:
ISI
Lingua:
ENG
Keywords:
ARTICULATORY SPEECH SYNTHESIS, NEURAL NETWORKS, SPEECH MOTOR CONTROL; ARTICULATORY MOVEMENTS; EMG (ELECTROMYOGRAPHY);
Tipo documento:
Article
Natura:
Periodico
Settore Disciplinare:
Science Citation Index Expanded
Science Citation Index Expanded
Citazioni:
NO
Recensione:
Indirizzi per estratti:
Citazione:
M. Hirayama et al., "PHYSIOLOGICALLY-BASED SPEECH SYNTHESIS USING NEURAL NETWORKS", IEICE transactions on fundamentals of electronics, communications and computer science, E76A(11), 1993, pp. 1898-1910

Abstract

This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transformsbetween cognitive intentions to speak, their physiological effects onvocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movementsduring natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, and lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Networkindependence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulatorbasis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion andthe speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer toour goal of emulating speech production processes computationally.

ASDD Area Sistemi Dipartimentali e Documentali, Università di Bologna, Catalogo delle riviste ed altri periodici
Documento generato il 15/07/20 alle ore 21:28:03