"Behind every stack of books there is a flood of knowledge."
The presented system gradually retrieves more information about the scene and the camera setup. Images contain a huge amount of information (e.g. color pixels). However, a lot of it is redundant (which explains the success of image compression algorithms). The structure recovery approaches require correspondences between the different images (i.e. image points originating from the same scene point). Due to the combinatorial nature of this problem it is almost impossible to work on the raw data. The first step therefore consists of extracting features. The features of different images are then compared using similarity measures and lists of potential matches are established. Based on these the relation between the views are computed. Since wrong correspondences can be present, robust algorithms are used. Once consecutive views have been related to each other, the structure of the features and the motion of the camera is computed. An initial reconstruction is then made for the first two images of the sequence. For the subsequent images the camera pose is estimated in the frame defined by the first two cameras. For every additional image that is processed at this stage, the features corresponding to points in previous images are reconstructed, refined or corrected. Therefore it is not necessary that the initial points stay visible throughout the entire sequence. The result of this step is a reconstruction of typically a few hundred feature points. When uncalibrated cameras are used the structure of the scene and the motion of the camera is only determined up to an arbitrary projective transformation. The next step consists of restricting this ambiguity to metric (i.e. Euclidean up to an arbitrary scale factor) through self-calibration. In a projective reconstruction not only the scene, but also the camera is distorted. Since the algorithm deals with unknown scenes, it has no way of identifying this distortion in the reconstruction. Although the camera is also assumed to be unknown, some constraints on the intrinsic camera parameters (e.g. rectangular or square pixels, constant aspect ratio, principal point in the middle of the image, …) can often still be assumed. A distortion on the camera mostly results in the violation of one or more of these constraints. A metric reconstruction/calibration is obtained by transforming the projective reconstruction until all the constraints on the cameras intrinsic parameters are satisfied. At this point enough information is available to go back to the images and look for correspondences for all the other image points. This search is facilitated since the line of sight corresponding to an image point can be projected to other images, restricting the search range to one dimension. By pre-warping the image -this process is called rectification- standard stereo matching algorithms can be used. This step allows to find correspondences for most of the pixels in the images. From these correspondences the distance from the points to the camera center can be obtained through triangulation. These results are refined and completed by combining the correspondences from multiple images. Finally all results are integrated in a textured 3D surface reconstruction of the scene under consideration. The model is obtained by approximating the depth map with a triangular wire frame. The texture is obtained from the images and mapped onto the surface. An overview of the systems is given in Figure 1.7.
Throughout the rest of the text the different steps of the method will be explained in more detail. An image sequence of the Arenberg castle in Leuven will be used for illustration. Some of the images of this sequence can be seen in Figure 1.8. The full sequence consists of 24 images recorded with a video camera.
In this section we will try to formulate an answer to the following questions. What do images tell us about a 3D scene? How can we get 3D information from these images? What do we need to know beforehand? A few problems and difficulties will also be presented.
An image like in Figure 1.1 tells us a lot about the observed scene. There is however not enough information to reconstruct the 3D scene (at least not without doing an important number of assumptions on the structure of the scene). This is due to the nature of the image formation process which consists of a projection from a three-dimensional scene onto a two-dimensional image. During this process the depth is lost.
Figure 1.2 illustrates this. The three-dimensional point corresponding to a specific image point is constraint to be on the associated line of sight. From a single image it is not possible to determine which point of this line corresponds to the image point.
If two (or more) images are available, then -as can be seen from Figure 1.3– the three-dimensional point can be obtained as the intersection of the two line of sights. This process is called triangulation. Note, however, that a number of things are needed for this:
The relation between an image point and its line of sight is given by the camera model (e.g. pinhole camera) and the calibration parameters. These parameters are often called the intrinsic camera parameters while the position and orientation of the camera are in general called extrinsic parameters.
In the following chapters we will learn how all these elements can be retrieved from the images. The key for this are the relations between multiple views which tell us that corresponding sets of points must contain some structure and that this structure is related to the poses and the calibration of the camera.
Note that different viewpoints are not the only depth cues that are available in images. In Figure 1.4 some other depth cues are illustrated. Although approaches have been presented that can exploit most of these, in this text we will concentrate on the use of multiple views.
In Figure 1.5 a few problems for 3D modeling from images are illustrated. Most of these problems will limit the application of the presented method. However, some of the problems can be tackled by the presented approach.
Another type of problems is caused when the imaging process does not satisfy the camera model that is used. In Figure 1.6 two examples are given. In the left image quite some radial distortion is present. This means that the assumption of a pinhole camera is not satisfied. It is however possible to extend the model to take the distortion into account. The right image however is much harder to use since an important part of the scene is not in focus. There is also some blooming in that image (i.e. overflow of CCD-pixel to the whole column). Most of these problems can however be avoided under normal imaging circumstance.
University of North Carolina – Chapel Hill, USA
Virtual Fashion Education
"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)
News About Tech, Money and Innovation
Modern art using the GPU
Find the perfect theme for your blog.
Learn to Learn
Con tằm đến thác vẫn còn vương tơ
Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109
Blog Toán Cao Cấp (M4Ps)
Indulge- Travel, Adventure, & New Experiences
"Behind every stack of books there is a flood of knowledge."
The latest news on WordPress.com and the WordPress community.