$29
PROBLEM
TOPIC
MAX. POINTS GRADED POINTS REMARKS
2.1
Set up a Static Scene
0.5
2.2
Capture a 4D Light Field
0.5
2.3
Acquiring the Data
0.5
2.4.1
Template and Window
0.5
2.4.2
Normalized Cross Correlation
0.5
2.4.3
Retrieving the Pixel Shifts
0.5
2.5
Synthesizing an Image with Synthetic Aperture
1.0
2.6
Repeating the Experiment for Different Templates
1.0
3.1
Deriving the Blur Kernel Width
3.0
3.2
Blur Kernel Shape
1.0
3.3
Blur and Scene Depth
0.5
3.4
Blur and Focal Length
0.5
Total
10
1
• Motivation
At the top-level, this problem set enables you to turn your cell phone into a 4D light field camera.
Shallow depth of field, i.e. having only a small area in focus, is a desirable aesthetic quality in a photograph. Unfortunately, this effect requires a large aperture, i.e., the lens is going to be big and bulky! But what if it was possible to turn your cell phone into a camera with a large aperture? What if we could selectively focus on objects in post-processing?
The goal of this homework is to synthesize images with smaller depths of field thus making it appear to have been taken from an expensive camera with a larger aperture[4] [2]. Figure 1a and b show a scene image with the corresponding synthetic aperture image with lower depth of field.
(a) All-in focus image. (b) Post-processed to blur the background
Figure 1: Turning a cell phone into a light field camera. (a) An all-in focus image taken with a cell phone camera.
(b) A light field stack is post-processed to blur out the background. Notice how the helmet stands out from the background.
• Experimental Component
We will capture a video by moving the camera in a zig-zag path as shown in Figure 2 in front of the static scene. Please use Python for all codes. Fill in each box below for credit.
Please note:
1. The algorithm being implemented does not take camera tilt into account. Avoid tilting and rotating the camera as much as possible.
2. The instruction set use a planar zig-zag path for camera motion as in 2. However, you are allowed to try different paths like circular or polyline.
3. The number of frames in the video captured will determine the time required to compute the output. Make sure the video is not too long.
2
Figure 2: A zig-zag planar motion of the camera in front of the static scene to capture a video.
2.1 Set up a Static Scene (0.5 points)
Set up a static scene similar to the one shown in Figure 1a. Try to have objects at different depths.
For credit, place your image in the box below (replace our helmet scene with your own scene).
Figure 3: Insert an ordinary photograph of the scene (replace our example).
2.2 Capture a 4D Light Field (0.5 points)
Take a video by waving your camera in front of the scene by following a specific planar motion. The more you cover the plane, the better will be your results. Ensure that all objects are in focus
3
in your video. For credit, place three frames of the video in the box below (replace our example). These frames differ in their parallax, i.e., an effect where object positions change in response to view.
Figure 4: Insert any three frames of your video here (replace our example). Make sure there is sufficient parallax in the images.
4
2.3 Acquiring the Data (0.5 points)
Write a function to read your video file and convert the video into a sequence of frames. Since this was captured from a cell phone, each frame image is in RGB color. Write a script to convert each frame to gray-scale. For credit, place the gray scale image of the first frame of your video in the box below (replace our example).
Figure 5: Insert the gray-scale image of the first frame (replace our example).
2.4 Registering the Frames (1.5 points)
2.4.1 Template and Window (0.5 points)
From the first frame of your video, select an object as a template. We will be registering all other frames of the video with respect to this template. Once a template has been selected in the first frame, we search for it in the subsequent frames. The location of the template in a target frame image will give us the shift(in pixels) of the camera. Since we don’t have to search for the template in the entire target frame image, we select a window to perform this operation. Note, however, that selecting a window is optional. This is done just to reduce the computation time. For credit, place the image of the first frame of your video in the box below with the template and the window markings (replace our example).
Figure 6: Insert your image with template object and search window marked (replace our example).
5
2.4.2 Normalized Cross Correlation (0.5 points)
Perform a normalized cross correlation of the template with the extracted search window.
Let A[i; j] be the normalized cross-correlation coefficient. If t[n; m] is our template image and w[n; m] is our window, then from [3] we have:
ånT;m=1[w(n; m)
i; j][t(n i; m j)
]
A[i; j] =
w
t
;
(1)
fånT;m=1[w(n; m)
i; j]2 ånT;m=1[t(n i; m
j)
]2g0:5
w
t
where t is the mean of the template and wi; j is the mean of the window w[n; m] in the region under the template. Plot the cross correlation coefficient matrix A[i; j] for one of the frames. For credit, place the plot in the box below (replace our example).
Figure 7: Insert the plot of the correlation coefficient Matrix (replace our example).
[Hint: Use the Scikit-image function match template with pad input=True to perform the 2D cross correlation. Scikit-image, an image processing library, can be installed via pip.]
2.4.3 Retrieving the Pixel Shifts (0.5 points)
The location that yields the maximum value of the coefficient A[i; j] is used to compute the shift [1]. The shift in pixels for each frame can be found by:
[sx; sy] = maxi; jfA[i; j]g:
(2)
For credit, please place the plot of sx v/s sy in the box below (replace our example).
6
Figure 8: Insert the plot of X pixel shift v/s Y pixel shift (replace our example).
2.5 Synthesizing an Image with Synthetic Aperture (1.0 points)
Once you have the pixel shifts for each frame, you can synthesize refocused image by shifting each frame in the opposite direction and then summing up all the frames. (Note: in Section 3, you will need to explain why this operation works. Start thinking about this now!)
Suppose the pixel shift vector for Frame Image Ii[n; m] is [sxi ; syi ]. Then, the image output, P[n; m] with synthetic aperture is obtained as:
P[n; m] = åIi[n sxi ; m syi ]:
(3)
i
For credit, place your synthetically ”defocused” image in the box below (replace our example).
Figure 9: Insert an image with an object in synthetic focus (replace our example).
[Hint: Use the OpenCV function warpAffine with a transformation matrix corresponding to a translation to perform the shifting operation.]
7
2.6 Repeating the Experiment for Different Templates (1.0 points)
Now, we will exploit the fact that we can synthetically focus on different depths. To do this, select a new object as your template and repeat all the steps to generate an image that is focused on this new object. Here, we have selected the cup as our new object. For credit, place a de-focused image with a different template object in focus in the box below (replace our example).
Figure 10: Insert an image with an object in synthetic focus. This object should be different from the previous box (replace our example).
8
• Assessment
3.1 Deriving the Blur Kernel Width (3.0 points)
The goal is to understand how much blur is synthetically added by using a model of pinhole cameras. Consider the coordinate diagram shown in Figure 11. Here, [X1,Z1] is a scene point of an object in the template, [X2,Z2] is a scene point of an object in the background and C(i) for i = 1; : : : ; k are positions of the apertures of cameras at which the scene is captured. The maximum camera translation is D and f is the focal length of the cameras (all are assumed to be the same).
Figure 11: Example coordinate system and notation. In this figure, the dashed plane is the virtual film plane, placed
one focal length above the apertures located at C(1); : : : ; C(k). This is a common shorthand convention so we do not have to flip the camera images. In reality, the actual film plane would be one focal length below the aperture location. This coordinate system is used as a guide - you are welcome to modify as needed.
We will use the shift-and-add method for light field imaging such that X1 is the point in focus (i.e. as the ”template” that we ”shift and add”). Derive a mathematical expression for the full-width half maximum (FWHM) of the blur kernel (W ) applied to X2. Credit will be assessed both for technical correctness and the presentation of the derivation. You should not need figures, but are welcome to include them. Insert your derivation in the box below.
[Hint: Our solution to derive W was about a half page.]
[Hint: To check your solution, if Z1 = Z2 the width of the blur kernel should be zero.]
9
10
3.2 Blur Kernel Shape (1.0 points)
Now that you have derived the FWHM of the blur kernel, please write the functional expression for the blur kernel. For example, is it a Gaussian blur?
3.3 Blur and Scene Depth (0.5 points)
Plot the width of the blur kernel, W , as a function of the difference in depth planes, jZ2 Z1j. Insert your plot in the box below. Comment on the relationship between these variables.
3.4 Blur and Focal Length (0.5 points)
Plot the width of the blur kernel, W , as a function of the focal length of the camera, f . Insert your plot in the box below. Comment on the relationship between these variables.
11
• Submission
Your submission will consist of a single tarball, ”UID.tar.gz”, where UID is the university ID of the submitter. It will be submitted on CCLE. Your tarball will consist of several files, listed below. Please respect the filenames and formatting exactly. You tarball should include:
README: a .txt file
– Line 1: Full name, UID, email address of first group member (comma separated)
– Line 2: Full name, UID, email address of second group member (comma separated) if any, empty otherwise
– Use the rest of the file to list the sources you used to complete this project
code/: a directory containing all the .py files that you used for your project. There are no requirements as to how the code files are formatted.
HW2: a PDF file with the answers to the questions in the homework.
Note: Your tarball should not contain the video used for the experimental component.
References
[1] Todor Georgeiv and Chintan Intwala. Light field camera design for integral view photography.
[2] Marc Levoy, Billy Chen, Vaibhav Vaish, Mark Horowitz, Ian McDowall, and Mark Bolas. Synthetic aperture confocal imaging. ACM Trans. Graph., 23(3), August 2004.
[3] J. P. Lewis. Fast normalized cross-correlation, 1995.
[4] Andrew Lumsdaine and Todor Georgiev. The focused plenoptic camera. In In Proc. IEEE ICCP, pages 1–8, 2009.
12