$29
1. (25 points) Linear algebra refresher.
(a) (12 points) Let A be a square matrix, and further let AAT = I.
i. (3 points) Construct a 2 2 example of A and derive the eigenvalues and eigen-vectors of this example. Show all work (i.e., do not use a computer’s eigenvalue decomposition capabilities). You may not use a diagonal matrix as your 2 2 example. What do you notice about the eigenvalues and eigenvectors?
ii. (3 points) Show generally that A has eigenvalues with norm 1.
iii. (3 points) Show generally that the eigenvectors of A corresponding to distinct eigenvalues are orthogonal.
iv. (3 points) In words, describe what may happen to a vector x under the transfor-mation Ax.
(b) (8 points) Let A be a matrix.
i. (4 points) What is the relationship between the singular vectors of A and the eigenvectors of AAT ? What about AT A?
ii. (4 points) What is the relationship between the singular values of A and the eigen-values of AAT ? What about AT A?
(c) (5 points) True or False. Partial credit on an incorrect solution may be awarded if you justify your answer.
i. Every linear operator in an n-dimensional vector space has n distinct eigenvalues.
ii. A non-zero sum of two eigenvectors of a matrix A is an eigenvector.
iii. If a matrix A has the positive semide nite property, i.e., xT Ax 0 for all x, then its eigenvalues must be non-negative.
iv. The rank of a matrix can exceed the number of non-zero eigenvalues.
v. A non-zero sum of two eigenvectors of a matrix A corresponding to the same eigenvalue is always an eigenvector.
2. (22 points) Probability refresher.
(a) (9 points) A jar of coins is equally populated with two types of coins. One is type \H50" and comes up heads with probability 0:5. Another is type \H60" and comes up heads with probability 0:6.
i. (3 points) You take one coin from the jar and ip it. It lands tails. How likely is the coin to be type H50?
1
ii. (3 points) You put the coin back, take another, and ip it 4 times. It lands T, H, H, H. How likely is the coin to be type H50?
iii. (3 points) A new jar is now equally populated with coins of type H50, H55, and H60 (with probabilities of coming up heads 0:5, 0:55, and 0:6 respectively). You take one coin and ip it 10 times. It lands heads 9 times. How likely is the coin to be of each possible type?
(b) (3 points) Consider a pregnancy test with the following statistics.
If the woman is pregnant, the test returns \positive" (or 1, indicating the woman is pregnant) 99% of the time.
If the woman is not pregnant, the test returns \positive" 10% of the time. At any given point in time, 99% of the female population is not pregnant.
What is the probability that a woman is pregnant given she received a positive test? The answer should make intuitive sense; given an explanation of the result that you nd.
(c) (5 points) Let x1; x2; : : : ; xn be identically distributed random variables. A random vector, x, is de ned as
2
x1
3
x =
6 x...2
7
6
x
7
6
n
7
4
5
What is E (Ax + b) in terms of E(x), given that A and b are deterministic?
(d) (5 points) Let
cov(x) = E (x
Ex)(x Ex)T
What is cov(Ax + b) in terms of cov(x), given that A
and b are deterministic?
3. (13 points) Multivariate derivatives.
(a) (2 points) Let x 2 Rn, y 2 Rm, and A 2 Rn m. What is rxxT Ay?
(b) (2 points) What is ryxT Ay?
(c) (3 points) What is rAxT Ay?
(d) (3 points) Let f = xT Ax + bT x. What is rxf?
(e) (3 points) Let f = tr(AB). What is rAf?
4. (10 points) Deriving least-squares with matrix derivatives.
In least-squares, we seek to estimate some multivariate output y via the model
y^ = Wx
In the training set we’re given paired data examples (x(i); y(i)) from i = 1; : : : ; n. Least-squares is the following quadratic optimization problem:
min
W
n
1 X
2
i=1
y(i)
2
Wx(i)
2
Derive the optimal W.
Hint: you may nd the following derivatives useful:
@tr(WA) = AT @W
◦ tr(WAWT ) = WAT + WA @W
5. (30 points) Hello World in Jupyer.
Complete the Jupyter notebook linear regression.ipynb. Print out the Jupyter notebook and submit it to Gradescope.
3