"Behind every stack of books there is a flood of knowledge."
“The aim of computer vision is to overfit to our visual world”
— remark by Antonio Torralba (after his third beer)
Human vision is one of the most remarkable machines that ever existed. From sparse, noisy, hopelessly ambiguous local scene measurements our brain manages to create a coherent global visual experience. But how can this task, while seemingly effortless for humans, remain so excruciatingly difficult for a computer? Part of the answer is that humans rely on years of prior visual experience to make sense of the world, while computers have to start tabula rasa. Clearly, learning is needed to make progress on this severely underconstrained problem. However, attempts at direct application of machine learning tools to raw visual data have been largely unsuccessful.
The goal of this graduate seminar course is to gain a deeper understanding of the computer vision problem in order to better reason about ways data and learning could be used to tackle it. The central focus will be on representation of visual data, rather than on fancy learning techniques. We will be looking at all stages of visual processing, from low-level (color, texture, local patches) all the way to high-level (object recognition, general image understanding). We will pay particular attention to mid-level vision (grouping, segmentation, figure/ground, scene layout, image parsing) — a crucial glue tying vision together that has been largely neglected. The course will have an emphasis on using large amounts of real data (images, video, textual annotations, other meta-data). We will also discuss the difficult issue of what is the right choice of training data and how can it be acquired.
The course will consist of reading and presenting an eclectic mix of classic and recent papers on a range of topics. All students will be required to submit a written summary for each paper. Additionally, there will be two substantial class projects during the term.
Prerequisite: 16-720 or equivalent graduate Computer Vision course (No exceptions!)
We will meet on Mondays and Wednesdays Noon-1:20pm in Wean 5409.
Check out this list of data sources for some ideas on where to get images to work with.
Each project team will have regular meetings to discuss the progress of their course project.
Meeting times are listed on the project meeting schedule.
Leave your comments about papers on the Class Blog
The paper list contains papers that will be discussed in class.
|Jan. 12||Alyosha Efros||Introduction, Vision: Measurement vs. Perception
Administrative stuff, overview of the course, datasets
|Jan. 14||Alyosha Efros||Overview lecture on theories of Visual Perception
Cavanagh, P. (1995) Vision is getting easier every day
Optional reading: Nakayama, K. (1998) Vision fin-de-siecle – a reductionistic explanation of perception for the 21st century?
|Jan 19||MLK Jr. Day — no class|
|Jan. 21||Alyosha Efros||Overview lecture on the physiology of vision
Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision
|Jan. 26||Alyosha Efros||What should be done at the Low level?||Low Level ppt|
|Jan. 28||Varun||Probability of Boundary
D. Martin, C. Fowlkes, and J. Malik. PAMI May 2004.
Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues
M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik. CVPR 2008.
|Global Pb pdf|
|Feb. 2||Varun/Alyosha||Probability of Boundary Continued
When is object/scene recognition just texture recognition?
|Feb. 4||Alyosha Efros||When is object/scene recognition just texture recognition?
Renninger, L.W. & Malik, J. Vision Research 2004. When is scene recognition just texture recognition?
Csurka, G., Bray, C., Dance, C., and Fan, L. ECCV 2004. Visual categorization with bags of keypoints
Winn, J., Criminisi, A. and Minka, T. ICCV 2005.Object Categorization by Learned Universal Visual Dictionary
|Bag of Words ppt|
|Feb. 9||Dan||TextonBoost Day
TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.
J. Shotton, J. Winn, C. Rother, A. Criminisi. In Proc. ECCV 2006.
(optional) Journal version of TextonBoost
|Feb. 11||Dan/Alyosha||Semantic Texton Forests
Semantic Texton Forests for Image Categorization and Segmentation.
J. Shotton, M. Johnson, R. Cipolla. In Proc. IEEE CVPR 2008.
Semantic Texton Forests implementation
Intro to objects: Geometry vs. Appearance
|(link is above)|
|Feb. 16||James Hays||Large Scale Scene Matching for Graphics and Vision|
|Feb. 18||Alyosha||Appearance makes an appearance: Sliding windows, constellations models, pictorial structures, and more.||Objects and Parts ppt|
|Feb. 23||Edward||Parts-Based Object Recognition
A Discriminatively Trained, Multiscale, Deformable Part Model
P. Felzenszwalb, D. McAllester, D. Ramanan, In Proc. IEEE CVPR 2008.
|Feb. 25||Alyosha||Introduction to Context||Context|
|March 2||Michael Tarr||Uncovering the Fundamental Principles of Visual Cortex|
|March 4||Brian||Object Recognition by Scene Alignment
B. C. Russell, A. Torralba, C. Liu, R. Fergus, W. T. Freeman In NIPS, 2007.
SIFT flow: dense correspondence across different scenes
|Stealing Objects with Computer Vision|
|March 16||Ekaterina||Contextual priming for object detection
A. Torralba. IJCV, Vol. 53(2), 169-191, 2003.
Object detection and localization using local and global features
|Context Challenge slides|
|March 18||Alyosha / Utsav||Introduction to Segmentation
Objects in Context
Context Based Object Categorization: A Critical Survey
|Friday March 20
|Utsav / Alyosha||Context Continued…
Object Categorization using Co-Ocurrence, Location and Appearance
Carolina Galleguillos, Andrew Rabinovich and Serge Belongie. CVPR 2008.
|Objects in Context|
|March 23||Pyry||Learning a Classification Model for Segmentation.
Xiaofeng Ren and Jitendra Malik. in ICCV 2003.
Image Segmentation by Data-Driven Markov Chain Monte Carlo.
|Segmentation Through Optimization|
On the semantics of a glance at a scene. Biederman, I. 1981
Recovering Surface Layout from an Image. D. Hoiem, A.A. Efros, and M. Hebert. IJCV, Vol. 75, No. 1, October 2007.
See also classic papers: Yakimovsky and Feldman (1973), Ohta, Kanade, Sakai (1978), Barrow and Tenenboum (1978).
|It’s a 3D world, after all!|
|March 30||Alyosha||Occlusion and Figure/Ground Reasoning
Figure/Ground Assignment in Natural Images.
|April 1st||Jiyan||Depth estimation from image structure
A. Torralba, A. Oliva. PAMI Vol. 24(9): 1226-1238. 2003.
Depth Information by Stage Classification.
Learning Depth from Single Monocular Images
|April 6||Mark||Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval
Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A. In ICCV 2007.
|Content Based Image Search|
Principles of Categorization. Eleanor Rosch
Big Book of Concepts, Chapter 3. Gregory L. Murphy.
(just focus on “Exemplar View” section)
|Concepts: from Instances to Meaning|
|April 10: 3:30pm NSH 1305||Derek Hoiem||Inferring Object Attributes|
|April 13||Yuandong||Sharing visual features for multiclass and multiview object detection
A. Torralba, K. P. Murphy and W. T. Freeman PAMI. vol. 29, no. 5, pp. 854-869, May, 2007.
|April 15||Zhaoyin||Learning compositional models for object categories from small sample sets
J. Porway, B. Yao, and S.C. Zhu Book Chapter in Sven Dickinson et al (eds.)
Object Categorization: Computer and Human Vision Perspectives, Cambridge University Press. 2009
A Stochastic Grammar of Images
|April 20||Alyosha and Scott||Learning Realistic Human Actions from Movies.
Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld. in Proc. CVPR’08
|April 22||Alyosha||The Unreasonable Effectiveness of Data and the Wisdom of Crowds||data|
|April 27||Alyosha + everyone||How do we know that we have solved vision?||Solving Vision|
|April 29||Project Presentations (1-4)|
|April 30, 6-8pm in NSH 3002||Project Presentations (5-10)|
This course has been inspired by these offered by several of my colleagues. Here is a partial list:
Virtual Fashion Education
"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)
News About Tech, Money and Innovation
Modern art using the GPU
Find the perfect theme for your blog.
Learn to Learn
Con tằm đến thác vẫn còn vương tơ
Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109
Blog Toán Cao Cấp (M4Ps)
Indulge- Travel, Adventure, & New Experiences
"Behind every stack of books there is a flood of knowledge."
The latest news on WordPress.com and the WordPress community.