Lib4U

‎"Behind every stack of books there is a flood of knowledge."

Ensemble Feature Selection for Automatic Speech Recognition

100672-0

David Gelbart

This is my ICSI page and for the most part it has not been updated since 2008. From 2000-2008 I was a member of the ICSI Speech group studying automatic speech recognition (ASR), or in other words, making computers turn speech into text. Although I have left ICSI, I will try to respond to queries about my work, so please feel free to contact me:

Email: firstname dot lastname at gmail dot com

Phone: 778-997-6098 (Time Zone: Pacific Standard Time)

 

My research

 

Ph.D. Thesis: Ensemble Feature Selection for Multi-Stream Speech Recognition

Multi-stream automatic speech recognition systems which use the combined decisions of an ensemble of classifiers, each with its own feature vector, are popular in the research literature. Past published work on feature selection for such systems has dealt with features in blocks. In my thesis, I tried feature selection at the level of individual features, using Ho’s random subspace method and Tsymbal et al.’s hill-climbing method. The thesis can be downloaded here. For a shorter overview of my work, see the INTERSPEECH 2009 paper I co-authored, which can be downloaded from the ICSI Publications page. The paper also includes results from a tech report I wrote after the thesis.

During my thesis work, I created versions of the OGI ISOLET and OGI Numbers corpora that are degraded by background noise, using various noises and signal-to-noise ratios. Other researchers can exactly reproduce these noisy corpora given copies of the original corpora from OGI. I have also set up ASR systems for ISOLET and Numbers which are built using open source components and are available to other researchers. See here for my ISOLET resources and here for my Numbers resources.

 

Compensation for Noise and Reverberation

I have experimented with a mean subtraction algorithm for reverberation compensation that was developed as part of Carlos Avendano’s thesis work at OGI. I redesigned the algorithm to produce time-domain output, making it much easier to integrate with existing ASR software. I then evaluated it using data from a corpus of spoken digits recordings collected (not by me) using tabletop microphones at ICSI. We published a paper on this at ASRU 2001. Please also see this page which contains corrections, source code, audio files, and additional results that were not included in the paper. That page also has a bibliography of related publications. We published further results at AVIOS 2002 and ICSLP 2002, and other research groups have published results since then using the time-domain output version of the algorithm.

I co-authored a EUROSPEECH 2003 paper with Docio and Morgan in which we compared the performance of different types of tabletop microphones, as well as investigating the performance of noise reduction.

 

Auditory-based Speech Recognition

Human speech recognition accuracy is often much higher than computer accuracy, even in tasks (like nonsense syllables) where semantic understanding does not play a role. This has inspired work that aims to build computer speech recognition using signal processing inspired by the human hearing system. I have co-authored papers on this topic with Werner Hemmert and others.

 

Gabor Filters for Speech Recognition

I helped Michael Kleinschmidt with his thesis work on the use of Gabor filters for speech recognition. The work has since been continued by Bernd Meyer and others. This page contains a bibliography, links to source code, and some information about unpublished results.

 

Automatic Speech Recognition in Meetings

Technologically, it is becoming increasingly simple to record and preserve the audio of meetings. The value of such recordings is higher if ASR-based speech indexing and search is possible, much like how preserving old emails is more useful if one can search through them for emails containing particular keywords. There are also potential uses of ASR technology while a meeting is ongoing.

ICSI has been doing quite a bit of work in this area. My main contribution was to extend Transcriber to support multiple-talker transcription. The modified tool was used by a number of people in 2001. Transcriber and other tools have made progress since then, so I am not sure whether my version still useful.

I also helped integrate noise reduction into ICSI’s ASR system for meetings. The code can be found here. (I didn’t write the core code, I just cleaned it up a bit and made it easier to use with meeting data.)

I have some ideas about adding reverberation to non-reverberant conversational speech training data (such as Switchboard), in order to increase the amount of reverberant training data available for meeting recognition. However, due to other priorities, I’m not working on this at the moment. If you are interested, please feel free to get in touch.

 

Other Research

For a full list of my publications, see the ICSI Publications page.

Links

  • Tutorials related to automatic speech recognition
  • A small directory of free (or free for research use) software and data for noise-robust and channel-robust speech processing.
  • Thoughts on computer-aided language learning
  • FocalFilter, a tool I co-wrote to block distracting websites in any browser
  • A useful nutrition site which analyzes your diet and makes recommendations

Some educational resources related to automatic speech recognition (ASR)

Note: The latest version of this page is now at a new location. The copy you are currently viewing is not as up-to-date.

Online resources:

Books:

Here is a short list of books. This is not a complete list of popular or recommended books! The books I list here vary in focus so I’ve mentioned how to find the table of contents online. I also recommend checking out the reviews on amazon.com.

  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed here.
  • Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on amazon.com.
  • Speech Processing — A Dynamic and Optimization-Oriented Approach by Li Deng and D. O’Shaughnessy. The table of contents can be viewed on amazon.com.
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed here.

Gunnar Evermann’s book recommendations can be found here. I’m not sure if that page will stay up now that he’s leaving the HTK project, so here is a summary:

  • Pattern Classification by Duda, Hart, and Stork (this is one of my favorites too)
  • Introduction to Statistical Pattern Recognition by Fukunaga
  • Automatische Spracherkennung — Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
  • The above-mentioned book by Xuang, Acero and Hon.
  • Automatic Speech Recognition — The Development of the SPHINX Recognition System by Lee.
  • Statistical Methods for Speech Recognition by Jelinek.
  • Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft

Applications of ASR:

Speech Technology Magazine is a good source for information about applications of speech technology.

Current ASR research:

If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word “system” in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:
intitle:2007 intitle:system speech recognition Some of these papers use the word “recent” in the title instead of the year. So in that case the above search would change to
intitle:recent intitle:system speech recognition

NIST organizes many of these benchmarks and they have information on their web site.

Automatic Speech Recognition Theory and Algorithms

This page is about resources for learning more about the theory and algorithms behind automatic speech recognition (ASR) technology.

Online educational resources:

Book recommendations:

David Gelbart’s book recommendations:

  • Spoken Language Processing: A Guide to Theory, Algorithm and System Development by Xuedong Huang, Alex Acero, Hsiao-Wuen Hon. The table of contents can be viewed​here.
  • Speech and Audio Signal Processing: Processing and Perception of Speech and Music by Ben Gold and Nelson Morgan. The table of contents can be viewed on amazon.com.
  • Speech Processing — A Dynamic and Optimization-Oriented Approach by Li Deng and D. O’Shaughnessy. The table of contents can be viewed on amazon.com.
  • Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin. The table of contents (and long excerpts) can be viewed ​here.
  • Pattern Classification by Duda, Hart, and Stork. This is about pattern recognition in general, not ASR in particular. The table of contents can be viewed ​here.
  • A Course in Phonetics by Ladefoged. This is a good place to learn about phonemes (which are used in ASR pronunciation dictionaries), acoustic phonetics (which relates to the design of ASR feature extraction methods such as MFCC), and articulatory phonetics (which is often used in designing decision tree rules for HMM state tying). The audio and video files accompanying the book are ​here and you might find them interesting even if you don’t own the book.

(The books vary in focus so I’ve mentioned how to find tables of contents online. I also recommend checking out the reviews on amazon.com.)

Gunnar Evermann’s book recommendations (a summary of ​this page):

  • Pattern Classification by Duda, Hart, and Stork
  • Introduction to Statistical Pattern Recognition by Fukunaga
  • Automatische Spracherkennung — Grundlagen, statistische Modelle und effiziente Algorithmen by Schukat-Talamazzini
  • The above-mentioned book by Xuang, Acero and Hon.
  • Automatic Speech Recognition — The Development of the SPHINX Recognition System by Lee.
  • Statistical Methods for Speech Recognition by Jelinek.
  • Corpus-Based Methods in Language and Speech Processing, edited by Young and Bloothooft

Applications of ASR:

​Speech Technology Magazine is a good source for information about applications of speech technology. The magazine can be read free online. They also have a ​blog.

Current ASR research:

If you want to know what techniques are currently attracting attention at the cutting edge of ASR research, papers that describe speech recognition systems that were built for official project benchmarks can be a good source of information. These systems tend to use a lot of different, carefully chosen techniques. Many of these papers have a year (the year of the benchmark) and the word “system” in the title, which makes them easy to find. For example, many such papers can be found with a Google Scholar search for:

intitle:2007 intitle:system speech recognition

Some of these papers use the word “recent” in the title instead of the year. So in that case the above search would change to

intitle:recent intitle:system speech recognition

NIST organizes many of these benchmarks and they have information on their ​web site.

Different types of papers:

Sometimes you will have a choice whether to read about some work in a conference paper, a journal paper, or a master’s or PhD thesis. A journal paper will often have more background and details than a conference paper, and a thesis will often have more than a journal paper.

The delay between when a conference paper is finished and when it is published tends to be shorter than the delay for a journal paper. And the delay for a PhD thesis is often shorter than for a conference paper. Thus it can happen that a conference or journal paper is published at the same time as the author’s PhD thesis, but the thesis contains many more results since it was actually finished later.

Source:

http://www.focalfilter.com/gelbart/icsi/

http://www.focalfilter.com/gelbart/icsi/edu.htm

http://www.dev.voxforge.org/projects/Main/wiki/TheoryAndAlgorithms

https://buffy.eecs.berkeley.edu/PHP/resabs/resabs.php?f_year=2005&f_submit=one&f_absid=100672

One comment on “Ensemble Feature Selection for Automatic Speech Recognition

  1. katom coupon
    May 3, 2013

    WONDERFUL Post.thanks for share..more wait ..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Virtual Fashion Technology

Virtual Fashion Education

toitocuaanhem

"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)

VentureBeat

News About Tech, Money and Innovation

digitalerr0r

Modern art using the GPU

Theme Showcase

Find the perfect theme for your blog.

lsuvietnam

Learn to Learn

Gocomay's Blog

Con tằm đến thác vẫn còn vương tơ

Toán cho Vật lý

Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109

Maths 4 Physics & more...

Blog Toán Cao Cấp (M4Ps)

Bucket List Publications

Indulge- Travel, Adventure, & New Experiences

Lib4U

‎"Behind every stack of books there is a flood of knowledge."

The WordPress.com Blog

The latest news on WordPress.com and the WordPress community.

%d bloggers like this: