University of Delaware

CISC 689 - Computer Vision

Homework #5:

Tracking / Structure from Motion

by Benjamin Berger

ID: 4513

feature tracking - condensing


link to class website: http://vision.cis.udel.edu/cv
task formulation: Homework 5 description (PDF)
paper for this assignment: Condensation - conditional density propagation for visual tracking (PDF)



Part 1 - Particle Filtering

Source Code:


Results:

State estimates of dot location for each frame of dot.avi (frame: x, y):
1: 85, 71
2: 77, 86
3: 77, 88
4: 80, 86
5: 80, 88
dot_tracked.avi: movie with particles and state estimates plotted on the original sequence dot_states_plot.png: path history of dots, plotted on last image of sequence
dot_state_estimates.txt: text file listing the frame numbers and state estimates for every frame


Comments:

As shown in the movie file feature tracking using random walk dynamics works very well for simple tracking tasks. The deterministic part of the dynamics function in this case is the identity transform, whereas the random part is a Gaussian experiment with a sigma=8. With this value the distribution of the particles is still big enough to catch the fastest movement of the dot in the sequence but small enough to achieve good precision.





Part 2 - Factorization for Structure from Motion

Source Code:


Results:

Hotel images 0, 50 and 99 showing feature locations achieved by the KLT open source tracker:



Features from the first image tracked throughout the sequence:


(green crosses: feature location in first frame; red crosses: feature lost)


Normalized 3-D scene structure, plotted from a few different viewpoints:
Red crosses: camera locations for each frame supposing the camera was moving.
(picked depth: d=500)


Comments:

The upper left picture above shows the 3-D scene from a viewpoint similar to the first image of the sequence. The beginning of the red line marks the first estimated camera location, which is fairly in the middle of the image as the camera would have been in this view. Respectively the upper right picture shows a viewpoint similar to the last image of the sequence. This time the end of the red crossed line is in the center of the view. The next viewpoint is chosen according frame 50 of the sequence. This views are nicely showing the movement of the camera relative to the house as a curve in space.
The camera locations are achieved by transforming the midpoint of the 2-D images (plus estimated depth) to the 3-D scene using the inverse camera matrices for each frame. Because of the ambiguity of affine projections (affine camera matrices used), the scale of the scene is not known. Therefore the distance (z axis) of the camera to the object can only be estimated. In the views above a depth of 500 is chosen, which means the distance of the camera to the house is a little more than the height of the house. This is a reasonable estimation for real sized buildings.



created 05/09/03 by Benjamin Berger