$24
Questions
(16 points) Derive the VC dimension of the following classi ers. Note that to prove the classi ers cannot shatter n+1 data points, it is not required to show all the con g-urations and you only need to show this happens in one reasonable con guration for simplicity.
(a) What is the VC dimension, dc, of a threshold c in R? The target function is speci ed by f(x) = +1 if x c and f(x) = 1 if x c. Prove your answer.
(b) What is the VC dimension, dI, of intervals in R? The target function is speci ed by an interval [a,b], and labels any example positive i it lies inside that interval. Prove your answer.
2. (24 points) Find the Maximum Likelihood Estimation (MLE) for the following pdf.
In each case, consider a random sample of size n. Show your calculation:
(a) f(xj ) = 1 e x , x 0; 0
f(xj ) = 2 x2 1, 0 < x 1, 0 < < 1
f(xj ) = 21 , 0 x 2 (Hint: You can draw the likelihood function and pick a based on all the data points.)
(30 points) Let P (xjC) denote a Bernoulli density function for a class C 2 fC1; C2g and P (C) denote the prior,
Given the priors P (C1) and P (C2), and the Bernoulli densities speci ed by p1 p(x = 0jC1) and p2 p(x = 0jC2), derive the classi cation rules for classifying a sample x into C1 and C2 based on the posteriors P (C1jx) and P (C2jx). (Hint: give rules for classifying x = 0 and x = 1.)
Consider D-dimensional independent Bernoulli densities speci ed by pij p(xj = 0jCi) for i = 1; 2 and j = 1; 2; : : : ; D. Derive the classi cation rules for classifying a sample x into C1 and C2. It is su cient to give your rule as a function of x.
1Instructor: Rui Kuang (kuang@cs.umn.edu). TAs: Ujval Bangalore Umesh (banga038@umn.edu) and Jungseok Hong (jungseok@umn.edu).
1
Follow the de nition in 3(b) and assume D = 2, p11 = 0:6, p12 = 0:1, p21 = 0:6,
and p22 = 0:9. For three di erent priors (P (C1) = 0:2; 0:6; 0:8 and P (C2) = 1
P (C1)), calculate the posterior probabilities P (C1jx) and P (C2jx). (Hint: Calcu-late the probabilities for all possible samples (x1; x2) 2 f(0; 0); (0; 1); (1; 0); (1; 1)g):
(30 points) Using the provided training, validation, and test datasets, write a Matlab program to calculate the maximum likelihood estimation on the training set. Consider a prior function de ned with respect to sigma as
P (C1j ) = 1 e ; 0
(1)
and P (C2) = 1 P (C1). Using the learned Bernoulli distributions and the given prior function, classify the samples in the validation set using your classi cation rules for
= 0:00001; 0:0001; 0:001; 0:01; 0:1; 1; 2; 3; 4; 5; 6. Finally, choose the best prior (the one that gives the lowest error rate on the validation set) and use it to classify the samples in the test set. Print to the Matlab terminal (you can use sprintf) a table of error rate of each prior on the validation set and the error rate using the best prior on the test set. (Hint: if some Bernoulli probabilities are 0, you can replace them with a small probability such as 10 10 to avoid the numerical problem.)
Instructions
Programming Questions: All programming questions must be written in Matlab, no other programming languages will be accepted. The code must be able to be executed from the terminal command prompt on the cselabs machines. Each function must take the inputs in the order speci ed and display the textual output via the terminal. For each part, you can submit additional les/functions (as needed) which will be used by the main functions speci ed below. Put comments in your code so that one can follow the key parts and steps. Please follow the rules strictly. If we cannot run your code, you will receive no credit.
{ Question 3:
Training function in Bayes learning.m: Bayes Learning(training data , vali-dation data). The function returns the outputs (p1: learned Bernoulli pa-rameters of the rst class, p2: learned Bernoulli parameters of the second class, pc1: best prior of the rst class, pc2: best prior of the second class). It must also print to the terminal (sprintf) a table of error rates of all priors.
Testing function in Bayes testing.m: Bayes Testing(test data, p1: the learned Bernoulli paramter of the rst class, p2: the learned Bernoulli paramter of the second class, pc1: the learned prior of the rst class, pc2: the learned prior of the second class). The function must print to the terminal (sprintf) the error rate on the test dataset.
2
{ Error rate: Error rate is the percentage of wrongly classi ed data points divided by the total number of classi ed data points.
Report: Solutions to Questions 1, 2 and 3 must be included in a report. The table of error rates on the validation set and the error rate on the test set for Question 4 must also be included in the report.
Things to submit:
hw1 sol.pdf: A document which contains the report with solutions to all questions. Scanned answer sheets need to be clean and 100% legible.
Bayes Learning.m and Bayes Testing.m: Code for Question 4.
Any other les, except the data, which are necessary for your code. Submit: All material must be submitted electronically via Canvas.
3