‎"Behind every stack of books there is a flood of knowledge."

Automatic Speech Recognition with Hidden Markov Models (OHSU)


Winter 2011
Mondays/Wednesdays, 11:30 a.m.- 1:00 p.m., Room WCC 403, Wilson Clark Center

John-Paul Hosom
‘hosom’ at cslu¸ogi¸edu

Course Description
Hidden Markov Models (HMMs) are widely used in today’s speech recognition systems. This course is an introduction to the theory and practice of speech recognition using HMMs. Topics include dynamic time warping, Markov Models and Hidden Markov Models (discrete, semi-continuous, and continuous), vector quantization, Gaussian Mixture Models, the Viterbi search algorithm, the forward-backward training algorithm, language modeling, and speech-specific adaptations of HMMs. The course is focused on understanding these fundamental technologies and developing the main components of speech recognition systems. Students can expect to come away from the course with an ability to write programs for the training and execution of simple HMM systems, and to know how to extend these systems to more complex cases.  Prerequisite: C programming experience.

The course syllabus is given in a pdf document


There are two recommended (but not required) textbooks:

Fundamentals of Speech Recognition
Lawrence Rabiner and Biing-Hwang Juang
Prentice Hall, New Jersey, 1993 (or later editions)
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon
Prentice Hall, New Jersey, 2001 (or later editions)

The lecture notes will provide the necessary material, but the textbooks provide valuable supplementary information.   Both textbooks should be on reserve at the library.

Grading Policy
Grading is based on three programming assignments, a midterm, and a final. The programming projects provide a template for basic functions such as file I/O and the basic program structure; the student must write the relevant functions. The three projects are worth 15%, 20%, and 25% of the total grade, respectively. The midterm is worth 20%, and the final is worth 20%.

Weekly Schedule

Lecture notes are added as links to power-point files.  Files related to programming assignments will also be posted here as ZIP files


  • Course Overview
  • Why Is Automatic Speech Recognition Difficult?
  • Background: Speech Production, Representations of Speech, Models of Human Speech Recognition ……………..
  • General Issues in Developing ASR Systems
  • Induction
  • DTW Motivation / Algorithm / Implementation
  • DTW Examples
  • Assign Project 1 on January 5
  • (Review) Relevant Probability / Statistics Background
  • Markov Models
  • Hidden Markov Models
  • Log-Domain Mathematics
  • HMM Topologies
  • Vector Quantization
  • January 17 is MLK Holiday
  • Gaussian Mixture Models
  • Project 1 is due January 19
  • HMMs for Speech; Start Viterbi Search
  • Assign Project 2 January 24
  • Continue Viterbi Search
  • Lots of Viterbi Search Examples
  • Speech Features (LPC, PLP, MFCC)
  • Semi-Markov Models
  • Initializing an HMM
midterm samplelecture10
  • In-Class Midterm on Feb. 7 (material from Lectures 1 through 9)
  • Go over midterm on Feb. 9
  • Forward Procedure
  • Backward Procedure
lecture 11
lecture 12
lecture 13project 3
  • Project 2 is due February 14
  • Gamma and Xi
  • Baum-Welch or Forward-Backward or EM Algorithm for training HMMs
  • Training on Multiple Files
  • Expectation-Maximization in General
  • Embedded Training
  • Search Algorithms: Two-Level
  • Search Algorithms: One-Pass
  • Go over Viterbi project on February 16
  • Assign Project 3 February 16
lecture 14
  • Feb. 21 is Presidents’ Day
  • Language Models:
    Incorporating N-Gram LM, Linear Smoothing, Good-Turing Smoothing, Discounting and Back-Off, Cache LM, Class-Based LM, Perplexity
lecture 15
lecture 16
  • Search Strategies I: Beam Search, Grammar/Tree Search, On-Line Processing,
  • Tree-Based Search with Language Models
  • Maximum Mutual Information (MMI) training
  • Other Approaches to ASR: HMM/ANN Hybrids, Hidden Dynamic Models (HDM)
lecture 17
lecture 17½
lecture 18
  • Search Strategies II: Balancing Insertions and Deletions, Grammar-Based Search, N-Best Output, Weighted Finite State Transducers (WFST)
  • Speaker Adaptation: Vocal Tract Length Normalization (VTLN), Maximum a Posteriori (MAP) adaptation, and Maximum Likelihood Linear Regression (MLLR)
  • Acoustic-Model Strategies: Semi-Continuous HMMs, State Tying, State Clustering, Cloning, Pause Models
  • Course Summary
  • Project 3 is due March 9
  • Take-Home Final Exam Due Friday March 18 by midnight



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on January 10, 2013 by in Digital Signal Process, Pattern Recognition, Science & Technology.
Virtual Fashion Technology

Virtual Fashion Education


"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)


News About Tech, Money and Innovation


Modern art using the GPU

Theme Showcase

Find the perfect theme for your blog.


Learn to Learn

Gocomay's Blog

Con tằm đến thác vẫn còn vương tơ

Toán cho Vật lý

Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109

Maths 4 Physics & more...

Blog Toán Cao Cấp (M4Ps)

Bucket List Publications

Indulge- Travel, Adventure, & New Experiences


‎"Behind every stack of books there is a flood of knowledge."

The Blog

The latest news on and the WordPress community.

%d bloggers like this: