$24
Course: Machine Learning (CS405) – Professor: Qi Hao
Question 1
Consider the polynomial function
M
y(x, w) = w0 + w1x + w2x + ... + wM xM = å wj xj
j=0
Calculate the coefficients w = fwig that minimize its sum-of-squares error function. Here a suffix i or j denotes the index of a component, whereas (x)i denotes x raised to the power of i.
Question 2
Suppose that we have three colored boxes r(red), b(blue), and g(green).Box r con-tains 3 apples, 4 oranges, and 3 limes, box bcontains 1 apple, 1 orange, and 0 limes, and box g contains 3 apples, 3 oranges, and 4 limes. If a box is chosen at random with probabilities p(r) = 0.2, p(b) = 0.2, p(g) = 0.6, and a piece of fruit is removed from the box (with equal probability of selecting any of the items in the box), then what is the probability of selecting an apple? If we observe that the selected fruit is in fact an orange, what is the probability that it came from the green box?
Question 3
Given two statistically independent variables x and z, show that the mean and vari-ance of their sum satisfies
E[x + z] = E[x] + E[z]
var[x + z] = var[x] + var[z]
1
Machine Learning (CS405) – Homework #1
2
Question 4
In probability theory and statistics, the Poisson distribution, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. If X is Poisson distributed, i.e.
• Possion(l), its probability mass function takes the following form:
P(Xjl) = lX e l
X!
It can be shown that if E(X) = l. Assume now we have n data points from
Possion(l) : D = fX1, X2, ..., Xng. Show that the sample mean lb = n1 åin=1 Xi is the maximum likelihood estimate(MLE) of l. If X is exponential distribution and its
distribution density function is f (x) = 1 e x for x > 0 and f (x) = 0 for x 0. Show
l
l
that the sample mean lbn1 åin=1 Xi is the maximum likelihood estimate(MLE) of l.
Question 5
(a) Write down the probability of classifying correctly p(correct) and the probability of misclassification p(mistake) according to the following chart.
(b) For multiple target variables described by vector t, the expected squared loss function is given by
Z Z
E[L(t, y(x))] = ky(x) tk2 p(x, t)dxdt
Show that the function y(x) for which this expected loss is minimized given by y(x) = Et[tjx].
Hints. For a single target variable t, the loss is given by
Z Z
E[L] = fy(x) tg2 p(x, t)dxdt
The result is as follows
y(x) = R
p(x)
= Z
tp(tjx)dt = Et[tjx]
tp(x, t)dt
Machine Learning (CS405) – Homework #1
3
Question 6
(a) We defined the entropy based on a discrete random variable X as
H[X] = å p(xi)lnp(xi)
i
Now consider the case that X is a continuous random variable with the probabil-ity density function p(x). The entropy is defined as
Z
H[X] = p(x)lnp(x)dx
Assume that X follows Gaussian distribution with the mean m and variance s, i.e.
p(x) = p
1
e
(x m)2
2ps
2s2
Please derive its entropy H[X].
(b) Write down the mutual information I(yjx). Then show the following equation
I[x, y] = H[x] H[xjy] = H[y] H[yjx]
You should download the HW1_programQuestion.ipynbfile first.
4
Machine Learning (CS405) – Homework #1
Program
(a) Plot the graph with given code, the result should be same as this.
(b)
On the basis of the results, you should try 0th order polynomial, 1st order poly-
nomial, 3rd order polynomial and some other order polynomial, show the results
include fitting and over-fitting.
(c)
Plot the graph of the root-mean-square error.
LinearRegression, and BayesianRegression in the file.
(d)
Plot the graph of the pr d ctive distribution resulting from a Ba esian treatment
of polynomial curve fitting using M=9 polynomial, with the fixed parameters
= 5 10 3 and b = 11.1(c rresponding to the known noise variance).
( )
Change the sample_size to 2, 3 or 10 times than before, explain the change of M.
Hints. You hould install matplotlib.pyplot, and read classes PolynomialFeature,