Homework #4 Solution

Starting from:

~~$29.99~~

$23.99

Home

1   SLIC Superpixels (35 pts)

Superpixel algorithms group pixels into perceptually meaningful regions while respecting potential object contours, and thereby can replace the rigid pixel grid structure. Due to the reduced complexity, superpixels are becoming popular for various computer vision applications, e.g., multiclass object segmentation, depth estimation, human pose estimation, and object localization. In this problem, you will implement a simple superpixel algorithm called Simple Linear Iterative Clustering (SLIC) that clusters pixels in the five-dimensional color and pixel coordinate space (e.g., r, g, b, x, y). The algorithm starts with a collection of K cluster centers initialized at an equally sampled regular grid on the image of N pixels. For each cluster, you define for a localized window 2S x 2S centered at the cluster center, where S = sqrt(N/K) is the roughly the space between the seed cluster centers. Then, you check whether the pixel within the 2S x 2S local window should be assigned to the cluster center or not (by checking the distance in 5D space to the cluster center). Once you loop through all the clusters, you can update the cluster center by averaging over the cluster members. Then, iterate the previous pixel to cluster assignment process till convergence or maximum iterations reached. For more details, please see SLIC Superpixels Compared to State-of-the-art Superpixel Methods, PAMI

2011.

In your write up:

(a) Explain your distance function for measuring the similarity between a pixel and cluster in the 5D

space. (5 pts)

(b) Choose one image, try three di↵erent weights on the color and spatial feature and show the three segmentation results. Describe what you observe. (5 pts)

(c) Choose one image, show the error map (1) at the initialization and (2) at convergence. Note:

error: distance to cluster center in the 5D space. (5 pts)

(d) Choose one image, show three superpixel results with di↵erent number of K, e.g., 64, 256, 1024 and run time for each K (use tic, toc) (5 pts)

(e) Use evalSLIC.m to evaluate your algorithms on the subset of Berkeley Segmentation Dataset (BSD) and show the three figures of performance using di↵erent number of segment K: (1) bound- ary recall and (2) under-segmentation error (3) averaged run time per image for the BSD (15 pts)

(f ) [Extra credit up to 10 pts] Try to improve your result on (e). You may try di↵erent color

space (e.g., CIELab, HSV), more complex distance measure (Sec 4.5 in the paper), richer image features (e.g., gradients), or even other algorithms. Report the accuracy on boundary recall and under-segmentation error with K = 256. Compare the results with (e) and explain why you get better results. Just changing the color space would be worth up to 5 pts.

2   EM Algorithm: Dealing with Bad Annotations (35 pts)

Dealing with noisy annotations is a common problem in computer vision, especially when using crowd- sourcing tools, like Amazon’s Mechanical Turk. For this problem, you’ve collected photo aesthetic ratings for 150 images.   Each image is labeled 5 times by a total of 25 annotators (each annotator provided 30 labels). Each label consists of a continuous score from 0 (unattractive) to 10 (attractive). The problem is that some users do not understand instructions or are trying to get paid without attending to the image. These “bad” annotators assign a label uniformly at random from 0 to 10. Other “good” annotators assign a label to the ith image with mean µi  and standard deviation     (

is the same for all images). Your goal is to solve for the most likely image scores and to figure out

which annotators are trying to cheat you.

In your write-up, use the following notation:

• xij 2 [0, 10]: the score for ith image from the jth annotator

• mj  2 {0, 1}: whether each jth annotator is “good” (mj  = 1) or “bad” (mj  = 0)

10

• P (xij |mj  = 0) =   1  : uniform distribution for bad annotators

                            (x    µ )2

2⇡

2

• P (xij |mj  = 1; µi , ) = p 1

exp(   1

ij    i

2

): normal distribution for good annotators

• P (mj  = 1; ) = : prior probability for being a good annotator

2.1    Derivation of EM Algorithm (20 pts)

Derive the EM algorithm to solve for each µi , each mj ,   , and   . Show the major steps of the derivation and make it clear how to compute each variable in the update step.

2.2    Application to Data (15 pts)

I’ve included a “annotation data.mat” file containing

• annotator ids(i) provides the ID of the annotator for the ith annotation entry

• image ids(i) provides the ID of the image for the ith annotation entry

• annotation scores(i) provides the ith annotated score

Implement the EM algorithm to solve for all variables. You will need to initialize the probability that each annotator is good – set the initial value to 0.5 for each annotator.

(a) Report the indices corresponding to “bad” annotators (j for which mj  is most likely 0) (b) Report the estimated value of

(c) Plot the estimated means µi  of the image scores (e.g., plot(1:150, mean scores(1:150)))

Tips

• For numerical stability, you should use logsumexp.m (http://www.cs.toronto.edu/pub/psala/Project/KPMtools/logsumexp.m) when computing the likelihood that an annotator is “good”, given the current parameters. The joint probability of each annotators scores and “goodness” will be very small (close to zero by numerical precision). Thus, you should compute the log probabilities and use logsumexp to compute the denominator.

• Another useful functions is normpdf

• In each iteration, I like to show the bar plot of probability of “good” for each annotator and to show the plot of the mean score for each image. At the end, you should be very confident which annotators are “good”.

3   Graph-cut Segmentation (30 pts)

Let us apply Graph-cuts for foreground/background segmentation. In the “cat” image, you are given a rough polygon of a foreground cat. Apply Graph-cuts to get a better segmentation.

be log[P (pixel|foreground)

First, you need an energy function. Your energy function should include a unary term, a data- independent smoothing term, and a contrast-sensitive smoothing term. Your unary terms should P(pixel|background) ]. Your pairwise term should include uniform smoothing and the contrast-

sensitive term. To construct the unary term, use the provided polygon to obtain an estimate of

foreground and background color likelihood. You may choose the likelihood distribution (e.g., color histograms or color mixture of Gaussians. Yes, you can use MATLAB GMM functions this time). Apply graph cut code for segmentation. You must define the graph structure and unary and pairwise terms and use the provided graph cut code or another package of your choice. Include in your writeup: (a) Explain your foreground and background likelihood function. (5 pts)

(b) Write unary and pairwise term as well as the whole energy function, in math expressions. (5 pts) (c) Your foreground and background likelihood map. Display P (f oreground|pixel) as an intensity

map (bright = confident foreground). (10 pts)

(d) Final segmentation. Create an image for which the background pixels are blue, and the foreground pixels have the color of the input image. (10 pts)