$23.99
Questions may continue on the back. Type your text. You may write math formulas by hand. What we cannot read, we will not grade.
You may talk about this assignment with others, but do not write down anything while you talk. After that, do the assignment alone. What you hand in must be your work only.
IMPORTANT: Implement all the functions in this assignment from scratch. That is, do not use anyone else’s code, unless otherwise instructed. Email your solution as a single PDF file to Ben at the beginning of class on the due date. Remember that class will not meet on November 20.
This assignment introduces several new concepts, so be patient and careful as you read. You will hopefully learn something new. You will need the file data.mat and a MATLAB function, all available in a zip file on the homework page.
Image classification and regression problems take as inputs descriptors of the image or of parts of it. These descriptors are called features. As not all parts of an image are interesting, it is useful to develop an operator that selects image regions that are worth describing. This selection is difficult in general, because what is “interesting” is a semantic notion that depends on the application. In this assignment, we will develop a syntactic “interest operator” instead. This operator is given a window (a small square of contiguous pixels) out of a gray-level image and returns a measure of the local distinctiveness of the image within the window.
In this context, a window W is “locally distinctive” if nearby windows are sufficiently different from W , in a sense to be made more precise later. To formalize this notion, we first need to define what it means for two image windows to be similar or dissimilar form each
other. Let
D(x, y) = X w(z) [I (z + y) − I (z + x)]2 for x, y ∈ R2 (1)
2
z∈Z
be a measure of the dissimilarity between two windows centered at x and y in image I . In this expression, Z is the set of integers, the summation is over two scalar variables z = (z1 , z2 )T (where the subscript T denotes transposition), and w(z) is a discrete, truncated, normalized Gaussian function with parameter σ:
(
w(z) =
ce−
z2 2
1 +z2
2σ2 for − h ≤ z1 ≤ h and − h ≤ z2 ≤ h .
0 elsewhere
Here,
h = d2.5σe
is the smallest integer that is no smaller than 2.5σ, and c is a real-valued constant such that
X w(z) = 1 .
2
z∈Z
The “window” is then an n × n square with n = 2h + 1. When x or y are not integer-valued, image values between pixels can be computed by bilinear interpolation. However, we do not worry about the values of D(x, y) for real-valued arguments in this exercise, other than in a conceptual sense.
The meaning of D(x, y) is straightforward. The truncated Gaussian w(z) selects small areas of the image around x and y. Dis- crepancies between corresponding pixel values in those areas are measured by the square of their difference, and weighted (through multiplication by w(z)) according to how close the pixels are from the center of each window. The weighted discrepancies are then added to each other to produce a single number D(x, y). If the image values in the two small areas were identical to each other, this number would be zero, and the number increases as the contents between the two windows grow different from each other.
1. In this problem we get some intuition as to what the function D(x, y) looks like when x is kept fixed and y is varied around x. To do this, we set
y = x + d
so that d is the displacement between a window centered at x and another one centered at y. We then pick three qualitatively different windows centered at x(1) , x(2) , x(3) in a test image, and plot D(x(i) , x(i) + d) for the 81 integer displacement vectors
d ∈ R = {d | d ∈ Z2 and d = (d1 , d2 )T with − 4 ≤ d1 ≤ 4 and − 4 ≤ d2 ≤ 4}
(the letter ‘R’ stands for the range of displacements). The set R is a square in “displacement space” and D(x(i) , x(i) + d) for fixed x(i)
is a function from the integer grid in this space into the non-negative reals.
(a) Write a MATLAB function with header
function [w1, u] = gauss(sigma)
that computes a discrete, truncated, one-dimensional n × 1 Gaussian kernel w1 with parameter σ, such that
w = w1 * w1’;
is the two-dimensional Gaussian kernel w(z) defined above. Note that if w1 is normalized to sum to 1, then w sums to 1 as well. [You may want to check this.]
The second output argument u is the vector -h:h where h is the half-width of the window (same as h in the problem statement above).
Hand in your code for gauss and show a plot of w1 versus u for σ = 3. Label the axes appropriately.
(b) Write a MATLAB function with header
function D = dissimilarity(I, x, y, sigma)
that computes the dissimilarity D(x, y). The input parameter I is a gray-level image with pixels of type uint8. The vectors x and y are 2 × 1 vectors with integer coordinates (it is fine to use variables of type double for x and y, but the values must be integer). The positive scalar sigma is the parameter σ used in the definition of D. The output D is a scalar of type double. Use your function gauss to generate the weight mask w(z).
The first coordinate of x, y, d is a row index, and the second coordinate is a column index.
Your code should abort with an error message if either x or y is too close to the image boundary and parts of the corresponding window would fall outside the image. It is OK to use for loops to scan the range R. Make sure you cast either the whole image or the relevant parts of it to double before doing any calculations with pixel values.
Hand in your code for the function dissimilarity, as well as three mesh plots of D(x(i) , x(i) +d) for the image shadow.jpg provided with this assignment and for i = 1, 2, 3 and for d in the range R defined earlier. For x(i) use the three columns of the matrix X in the data.mat file provided with this assignment (use load data to load X into your workspace). Set σ = 3 for all plots.
To display a mesh plot use the MATLAB function mesh (use the MATLAB help to figure out how mesh works). Before printing the mesh plot, set its viewing direction (by hand or with the view function) so that the plot is well visible. What MATLAB calls the y axis should be marked with label d1 and the x axis should be marked with label d2 . The tick marks on the axes should be labeled from −4 to 4 on each axis. Do not hand in the code used to make the plots.
(c) The three windows centered at x(i) for i = 1, 2, 3 in the previous question contain respectively a rather flat part of the image (i = 1), an edge (i = 2), and a corner (i = 3). You may want to visually explore these windows using MATLAB’s display facilities.
In what way do the three functions D(x(i) , x(i) + d) differ from each other qualitatively? More specifically, the function D(x, x + d) is nonnegative and is zero for d = 0, so in some sense it is supposed to look like a bowl (a convex paraboloid) in the vicinity of d = 0. How well-defined is the bottom of the bowl in each of the three different situations? In answering this question, pay attention also to the actual values of D (the values on the vertical axis of your plots), and keep in mind that the image is noisy. Because of this, small fluctuations in the value of D are not significant.
2. We now use the intuition developed with the previous exercise to come up with a definition of “interestingness.” An image window is “interesting” if the bottom of the bowl formed by D(x, x + d) as a function of d is well defined. When this is the case, there is enough going on in the window around x that sliding the window in any direction makes the window look different from its original version. In particular, a flat part of the image is not interesting, nor is a window along a straight edge. A corner is interesting, and so is an image detail with rich textural content (for instance, a polka-dot pattern, gravel, or a fine checkerboard).
To measure this notion of “interestingness,” one could in principle build the bowl for each window in the image and then somehow measure how well defined the bottom of the bowl is. A more efficient way to achieve the same result is to note that D(x, x + d) must look like a paraboloid in an infinitesimally small region around d = 0, because in such a small region one can approximate the function D by its Taylor series truncated after the second-order term. The derivation that follows computes the smallest curvature of this second-order term, and calls this curvature the “distinctiveness” (or “interestingness”) of the window centered at x.
Imagine fixing x and seeing D(x, x + d) as a function of d only. If you take d = 0, the two windows centered at x and x + d are then the same window, and
D(x, x) = 0 .
In addition, since D cannot get any smaller than zero, d = 0 is also a minimum of D(x, x + d) with respect to d, and therefore the gradient there is zero:
dD(x, x + d)
dd d=0
= 0 .
If you move away from d = 0 in any direction, the function D cannot decrease. It either stays constant, which means that the window at x + d looks exactly the same as the window at x, or it increases. In other words, the function D(x, x + d) with x fixed looks like the bottom of a bowl in a small neighborhood of d = 0: The gradient at the bottom is zero, and the curvature of the bowl may be different as you move in different directions away from d = 0, but is always nonnegative.
The distinctiveness of the window at x is defined as the curvature of the bowl at d = 0 in the direction in which it curves the least. Specifically, we say that a window centered at x has distinctiveness λmin (x) if
min
1 d2 D(x, x + au)
= λmin (x) . (2)
2
{u∈R
| kuk=1} 2
da2
a=0
Let us parse this definition. Imagine starting at the bottom of the bowl at d = 0. Now pick an arbitrary direction away from that point, and keep track of that direction with a unit vector u on the plane. As the positive, real number a increases away from 0, the point
y = x + au
moves away from x in the direction of u. The function D has zero gradient, so to first order it does not change. However, the second derivative may be positive, which means that the window centered at y is different from the starting window centered at x, and D(x, x + au) increases, at least initially. Measure the rate of increase with one half1 of the second derivative in the definition (2). Along some directions u, the second derivative at a = 0 is smaller than along others. Pick the direction u that gives the smallest increase. One half of the second derivative in that direction is the distinctiveness of the window at x. At this point, you may want to go back to your plots of D(x, x + d) and check your intuition.
You will now prove that λmin (x) is the smaller of the two eigenvalues of the 2 × 2 matrix
A(x) = X
2
2
w(z) ∇I (z + x)[∇I (z + x)]T (3)
where
z∈Z
∇I (x) =
" ∂I #
∂x1
∂I
∂x2
is the gradient of the image I at x = (x1 , x2 )T . I will take you through the proof, so do not worry: All you need to do is to fill the blanks.
The expression (1) for D contains the argument z + y, which for y = x + d becomes z + x + d. We start by approximating the image at z + x + d by its Taylor series around z + x, truncated to the linear term in d:
I (z + x + d) ≈ I (z + x) + [∇I (z + x)]T d .
In this way, D(x, x + d) is approximately a quadratic function of d for small values of d:
2
D(x, x + d) ≈ Q(x, x + d) = X w(z) I (z + x) + [∇I (z + x)]T d − I (z + x)
2
z∈Z
= X w(z) [∇I (z + x)]T d .
2
z∈Z
(a) Compute one half of the partial derivatives of Q(x, x + d) with respect to d. Since d has two components (call them d1 and d2 ), your result will be a vector with two derivatives:
q(x, d) =
1 ∂Q(x, x + d)
" ∂Q(x,x+d) #
1
= ∂d1
2 ∂d
2 ∂Q(x,x+d)
∂d2
[Hint: If you are familiar with differential calculus applied to vectors, this is a one-line derivation. If not, just compute the two derivatives separately, and then package the result into a 2 × 1 vector.]
(b) Find a matrix A(x) such that your previous answer has the form
q(x, d) = A(x) d .
Show all your steps clearly. [Hints: (1) Depending on how you wrote the expression for q(x, d), it may be useful to use the fact that if s is a scalar and f is a vector, then sf = f s. (2) The matrix in this question is not called A(x) by coincidence.]
1 The factor 1/2 is introduced merely for convenience in later calculations.
(c) The Hessian of Q(x, x + d) is the 2 × 2 matrix of second derivatives of Q with respect to d:
∂2 Q(x, x + d)
∂2 Q(x,x+d)
∂d2
∂2 Q(x,x+d)
∂d ∂d
H = = 1 1 2
∂d∂dT
∂2 Q(x,x+d)
∂d2 ∂d1
∂2 Q(x,x+d)
1
∂d2
and can be found by differentiating the two entries of q(x, d) each with respect to d1 and d2 . That is, if the two entries of q are q1
and q2 , we have
1
" ∂q1
∂d1
∂q2
∂d1
∂q1 #
.
∂d2
H =
∂q2
2
∂d2
Use your answer to the previous question to show that
1
H = A(x) .
2
[Hint: Again, if you know how to differentiate with respect to vectors, this is an instant derivation. If not, just spell out the product
A(x)d in terms of the entries of A(x) and d and compute the four derivatives separately.]
Since D ≈ Q, the Hessian of D is approximately A(x). It is not too hard to show that one half of the second derivative in equation
(2) is then
u
1 d2 D(x, x + au) T
2 da2
a=0 ≈
A(x)u
(the equality would be exact if the Hessian of D were exactly A(x)). Also, the expression uT A(x)u is minimized over all unit vectors
u when
u = vmin ,
the eigenvector of A(x) that corresponds to the smaller (λmin (x)) of its two real eigenvalues, and
vT
min A(x)vmin = λmin (x) .
You need not prove any of these statements. Together, they show that the distinctiveness λmin (x) of a window at x is the smaller eigenvalue of the matrix A(x) defined in equation (3). This matrix, which is only approximately the Hessian of the dissimilarity D, is called the Gramian of D weighted by w(z). You may recognize this expression as the image structure tensor defined in the textbook in section 13.2.2. The selection criterion in equation (13.15) of the book is slightly different from λmin (x), but the idea is very similar.
We can now compute λmin (x) everywhere in an image, and say that a window is interesting (or distinctive) if λmin (x) is greater than
some predefined threshold τ .
3. The function grad provided with this assignment computes the gradient of an image. If you say
g = grad(I);
then g is a cell array with two entries: g{1} is an image that contains the component of the gradient in the vertical direction at every pixel and g{2} contains the component in the horizontal direction. The function grad also takes an optional argument for the parameter σ of the kernel, but we will use the default value in this exercise.
(a) Use the function grad and the MATLAB builtin function conv2 with the ’same’ option to write a function with header
function lambdaMin = smallEigenvalue(I, sigma)
that computes an image with the smallest eigenvalue λmin (x) of the Gramian A(x) defined in equation (3) at every pixel in image I. The second argument sigma is the parameter σ used in the definition of w(z). There is no need to cast I to double, because grad already does that for its own computations. Use your function gauss to compute the weighting mask w1 such that w = w1 * w1’.
If you are surprised that conv2 shows up here, note that expression (3) for A(x) is a convolution. Specifically, the product
a(x) d(x)
P (x) = ∇I (x)[∇I (x)]T =
d(x) b(x)
is a 2 × 2 symmetric matrix. So your code will first call grad to compute the two components of ∇I everywhere in the image, then use matrix operations to compute the three images a(x), b(x), d(x) in P (x). The three distinct entries of
A(x) =
p(x) r(x)
r(x) q(x)
are the convolutions of the three distinct entries of P (x) with the Gaussian kernel w. Make sure you use the separability of the
Gaussian to make these convolutions efficient.
Finally, you can compute the image lambdaMin by recalling that the smaller eigenvalue of A(x) is
λmin (x) = p(x) + q(x) − p[p(x) − q(x)]2 + 4r2 (x) .
Use matrix operations to do this computation at once over the entire image, without explicit for loops.
Hand in your code and the image lambdaMin obtained on the image shadow.jpg provided with this assignment. To save ink, invert the image so that large values are mapped to black and small values to white. You can do this as follows:
clf imagesc(lambdaMin) colormap gray
cmap = 1 - colormap;
colormap(cmap);
axis image axis off
(b) Comment on your image lambdaMin. In particular, what do interesting windows in shadow.jpg look like?