[11.4上] 信研院系列学术讲座

Prosody-Dependent Speech Recognition

Mark Hasegawa-Johnson, UIUC

with Jennifer Cole, Chilin Shih, Ken Chen, Aaron Cohen, Sandra Chavarria, Heejin Kim, Taejin Yoon, Sarah Borys, and Jeung-Yoon Choi

时间: 2004年11月4日(星期四)上午10:00-11:00

地点: 清华大学信息科技大楼1-415室

ABSTRACT

A "prosody-dependent speech recognizer" is a speech recognizer that conditions acoustic phoneme models on prosodic context, in the same way that standard speech recognizers condition triphone models on left and right phone context. In a prosody-dependent speech recognizer, the prosody of the utterance is recognized simultaneously with the word string. This talk will demonstrate that, by correctly modeling the dependence of phoneme acoustics on prosody, and of prosody on syntax, it is possible to reduce the word error rate (WER) of a speech recognizer. Word error rate is improved mainly because the observed prosody is linguistically unlikely to co-occur with any incorrect word string. Additional improvements, in both perplexity and WER, can be obtained using a semi-factored language model, in which the relationship between prosody and the word sequence is at least partly mediated by syntactic tags. Careful analysis of the relationship between prosody and syntax indicates that syntactic phrase boundaries are the most important cue for prosodic phrase boundary recognition, while part of speech is the most important cue for locating pitch accents, but that neither of these cues is entirely sufficient for either classification task. Experiments to port this system from Radio News speech to Conversational Telephone Speech are currently under way.

ABOUT THE SPEAKER

Mark Hasegawa-Johnson received the S.B., S.M., and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, in 1989, 1989, and 1996 respectively. He has held engineering internships in echo cancellation at Motorola Labs, Schaumburg, IL, and in speech coding at Fujitsu Laboratories Limited, Kawasaki, Japan. From 1996 to 1997, he was the 19th annual ASA F.V. Hunt Post-Doctoral Research Fellow, designing and testing models of articulatory motor planning at the University of California at Los Angeles and at MIT. From 1998 to 1999, he held an NIH individual national research service award at UCLA. In 1999, he became an Assistant Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, where he co-founded the Illinois Speech and Language Engineering group. His current research focuses on the use of novel machine learning technologies to parameterize phonetic and prosodic knowledge representations for the purpose of improved audio and audiovisual speech recognition. Dr. Hasegawa-Johnson is the author or co-author of four patents, nine refereed journal articles, and more than 60 conference papers. Dr. Hasegawa-Johnson is Associate Editor of the IEEE Signal Processing Letters. He is a Senior Member of the IEEE Signal Processing Society, and a member of the Acoustical Society of America, the Audio Engineering Society, and the International Speech Communication Association. He is also a member of the honor societies Eta Kappa Nu, Tau Beta Pi, Sigma Xi, and Phi Beta Kappa, and has been listed in Marquis Who's Who in Science and Technology.

欢迎参加!

关于本讲座的任何事宜,请与郑方教授联系

(电话:138-0101-2234)