[10.21上] UIUC讲习教授学术讲座-清华大学信息科学技术学院

[10.21上] UIUC讲习教授学术讲座

发布时间：2004-10-19 点击数：

信息学院讲习教授系列学术讲座

Landmark-Based Speech Recognition: The Marriage of High-Dimensional Machine Learning Techniques with Modern Linguistic Representations

Mark Hasegawa-Johnson, UIUC

with James Baker, Sarah Borys, Ken Chen, Emily Coogan, Steven Greenberg, Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan, Jennifer Muller, Kemal Sonmez, Tianyu Wang

时间: 2004年10月21日（星期四）上午10:00-11:00

地点: 清华大学信息科技大楼1-315室

ABSTRACT

This talk will describe a radically new paradigm for automatic speech recognition, implemented and tested during summer 2004 at the Johns Hopkins workshop in language and speech processing. This research seeks to bring together new ideas from phonetics and phonology with new ideas from artificial intelligence in order to better match human speech recognition performance. Specifically, we developed three prototype landmark-based speech recognizers, and tested them in a second-pass rescoring paradigm. Each prototype recognizer first uses a bank of support vector machines to learn the mapping from an input auditory feature space (2000 dimensions) to an intermediate phonetic space of landmark-dependent probabilities (40-70 dimensions). The landmark-dependent probabilities are then integrated using one of three different pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network that models the asynchrony and independent reduction of separate vocal tract articulators, or a discriminative pronunciation model trained using the methods of maximum entropy classification. These three different prototype systems were applied to the problem of rescoring the word lattice output of a highly refined standard system (a two-pass combination of neural-network-HMM hybrids, developed at SRI, using triphone acoustic models, with several different acoustic feature streams). None of the new systems produced significantly fewer errors than the baseline; one system (the dynamic Bayesian network) produced a pattern of errors quite similar to the baseline system, while one system (the discriminative pronunciation model) produced a different and possibly complementary pattern of errors. Ongoing experiments will determine whether the new systems can be combined with the baseline in order to produce a net reduction in word error rate.

ABOUT THE SPEAKER

Mark Hasegawa-Johnson received the S.B., S.M., and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, in 1989, 1989, and 1996 respectively. He has held engineering internships in echo cancellation at Motorola Labs, Schaumburg, IL, and in speech coding at Fujitsu Laboratories Limited, Kawasaki, Japan. From 1996 to 1997, he was the 19th annual ASA F.V. Hunt Post-Doctoral Research Fellow, designing and testing models of articulatory motor planning at the University of California at Los Angeles and at MIT. From 1998 to 1999, he held an NIH individual national research service award at UCLA. In 1999, he became an Assistant Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, where he co-founded the Illinois Speech and Language Engineering group. His current research focuses on the use of novel machine learning technologies to parameterize phonetic and prosodic knowledge representations for the purpose of improved audio and audiovisual speech recognition. Dr. Hasegawa-Johnson is the author or co-author of four patents, nine refereed journal articles, and more than 60 conference papers. Dr. Hasegawa-Johnson is Associate Editor of the IEEE Signal Processing Letters. He is a Senior Member of the IEEE Signal Processing Society, and a member of the Acoustical Society of America, the Audio Engineering Society, and the International Speech Communication Association. He is also a member of the honor societies Eta Kappa Nu, Tau Beta Pi, Sigma Xi, and Phi Beta Kappa, and has been listed in Marquis Who's Who in Science and Technology.

欢迎参加！

关于本讲座的任何事宜，请与郑方教授联系

（电话：138-0101-2234）