Starting from:

$35

Deep Learning Homework #1 Solution


    1. (25 points) Linear algebra refresher.

        (a) (12 points) Let A be a square matrix, and further let AAT = I.

            i. (3 points) Construct a 2 2 example of A and derive the eigenvalues and eigen-vectors of this example. Show all work (i.e., do not use a computer’s eigenvalue decomposition capabilities). You may not use a diagonal matrix as your 2 2 example. What do you notice about the eigenvalues and eigenvectors?

            ii. (3 points) Show generally that A has eigenvalues with norm 1.

            iii. (3 points) Show generally that the eigenvectors of A corresponding to distinct eigenvalues are orthogonal.

            iv. (3 points) In words, describe what may happen to a vector x under the transfor-mation Ax.

        (b) (8 points) Let A be a matrix.

            i. (4 points) What is the relationship between the singular vectors of A and the eigenvectors of AAT ? What about AT A?

            ii. (4 points) What is the relationship between the singular values of A and the eigen-values of AAT ? What about AT A?

        (c) (5 points) True or False. Partial credit on an incorrect solution may be awarded if you justify your answer.

            i. Every linear operator in an n-dimensional vector space has n distinct eigenvalues.

            ii. A non-zero sum of two eigenvectors of a matrix A is an eigenvector.

            iii. If a matrix A has the positive semide nite property, i.e., xT Ax 0 for all x, then its eigenvalues must be non-negative.

            iv. The rank of a matrix can exceed the number of non-zero eigenvalues.

            v. A non-zero sum of two eigenvectors of a matrix A corresponding to the same eigenvalue is always an eigenvector.

    2. (22 points) Probability refresher.


        (a) (9 points) A jar of coins is equally populated with two types of coins. One is type \H50" and comes up heads with probability 0:5. Another is type \H60" and comes up heads with probability 0:6.

            i. (3 points) You take one coin from the jar and ip it. It lands tails. How likely is the coin to be type H50?

1

        ii. (3 points) You put the coin back, take another, and ip it 4 times. It lands T, H, H, H. How likely is the coin to be type H50?

        iii. (3 points) A new jar is now equally populated with coins of type H50, H55, and H60 (with probabilities of coming up heads 0:5, 0:55, and 0:6 respectively). You take one coin and ip it 10 times. It lands heads 9 times. How likely is the coin to be of each possible type?

    (b) (3 points) Consider a pregnancy test with the following statistics.

If the woman is pregnant, the test returns \positive" (or 1, indicating the woman is pregnant) 99% of the time.

If the woman is not pregnant, the test returns \positive" 10% of the time. At any given point in time, 99% of the female population is not pregnant.

What is the probability that a woman is pregnant given she received a positive test? The answer should make intuitive sense; given an explanation of the result that you nd.

    (c) (5 points) Let x1; x2; : : : ; xn be identically distributed random variables. A random vector, x, is de ned as

2
x1
3


x =
6 x...2
7



6
x
7



6
n
7



4

5


What is E (Ax + b) in terms of E(x), given that A and b are deterministic?
(d) (5 points) Let





cov(x) = E (x
Ex)(x  Ex)T

What is cov(Ax + b) in terms of cov(x), given that A
and b are deterministic?



    3. (13 points) Multivariate derivatives.


        (a) (2 points) Let x 2 Rn, y 2 Rm, and A 2 Rn  m. What is rxxT Ay?

        (b) (2 points) What is ryxT Ay?
        (c) (3 points) What is rAxT Ay?
        (d) (3 points) Let f = xT Ax + bT x. What is rxf?
        (e) (3 points) Let f = tr(AB). What is rAf?

    4. (10 points) Deriving least-squares with matrix derivatives.

In least-squares, we seek to estimate some multivariate output y via the model

y^ = Wx

In the training set we’re given paired data examples (x(i); y(i)) from i = 1; : : : ; n. Least-squares is the following quadratic optimization problem:

min
W

n
1 X

2
i=1

y(i)

2
Wx(i)

2

Derive the optimal W.

Hint: you may    nd the following derivatives useful:

@tr(WA) = AT @W

        ◦ tr(WAWT ) = WAT + WA @W


    5. (30 points) Hello World in Jupyer.

Complete the Jupyter notebook linear regression.ipynb. Print out the Jupyter notebook and submit it to Gradescope.

















































3

More products