$29
Probability and Statistics
A.1 [2 points] (Bayes Rule, from Murphy exercise 2.4.) After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing negative given that you dont have the disease). The good news is that this is a rare disease, striking only one in 10,000 people. What are the chances that you actually have the disease? (Show your calculations as well as giving the nal result.)
A.2 For any two random variables X; Y the covariance is de ned as Cov(X; Y ) = E[(X E[X])(Y E[Y ])].
You may assume X and Y take on a discrete values if you nd that is easier to work with.
a.
[1
points]
If E[Y jX = x] = x show that Cov(X; Y ) = E[(X E[X])2].
b.
[1
points]
If X; Y are independent show that Cov(X; Y ) = 0.
A.3 Let X and Y be independent random variables with PDFs given by f and g, respectively. Let h be the PDF of the random variable Z = X + Y .
a. [2 points] Show that h(z) =
1 f(x)g(z x)dx. (If you are more comfortable with discrete probabilities,
you can instead derive an
analogous expression for the discrete case, and then you should give a one
R
sentence explanation as to why your expression is analogous to the continuous case.).
b. [1 points] If X and Y are both independent and uniformly distributed on [0; 1] (i.e. f(x) = g(x) = 1 for x 2 [0; 1] and 0 otherwise) what is h, the PDF of Z = X + Y ?
A.4 [1 points] A random variable X N ( ; 2) is Gaussian distributed with mean and variance 2. Given that for any a; b 2 R, we have that Y = aX + b is also Gaussian, nd a; b such that Y N (0; 1).
A.5 [2 points] For a random variable Z, its mean and variance are de ned as E[Z] and E[(Z E[Z])2], re-spectively. Let X1; : : : ; Xn be independent and identically distributed random variables, each with mean and
1
P
n
variance 2. If we de ne n =
i=1 Xi, what is the mean and variance of pn( n
)?
n
x
f(y)dy. For any
F (x) =
A.6 If f(x) is a PDF, thebcumulative distribution function (CDF) is de ned as b
value of g(X) is de ned
function g : R ! R and random variable X with PDF f(x), recall that the expected
R
nX1; : : : ; Xn be
1fx ag is 1
Rwhenever x a and 0 whenever x > a. Note that F (x) = E[1fX xg]. Let
as E[g(X)] =
1 g(y)f(y)dy. For a boolean event A, de ne 1fAg as 1 if A is true, and 0 otherwise. Thus,
independent and identically distributed random variables with CDF F (x). De ne Fn(x) =
n1
i=1 1 Xi xg.
b
Pto thef previous
Note, for every x, that Fn(x) is an empirical estimate
of F (x).
You may use your answers
problem.
b
a. [1 points]
For any x, what is E[Fn(x)]?
b. [1 points]
For any
variance
of
E[(Fn(x)
2
x, the
Fn(x) is
F (x)) ]. Show
that
Variance(Fn(x)) =
b
F (x)(1 F (x))
.
b
b
b
n
1
c. [1 points]
Using your answer to b, show that for all x 2 R, we have E[(Fn(x) F (x))2]
1
.
4n
[1 points]
Let X1; : : : ; Xn be n independent and identically distributed
random variables drawn un romly
B.1
b
at random from [0; 1]. If Y = maxfX1; : : : ; Xng then nd E[Y ].
Linear Algebra and Vector Calculus
2
1
2
1
3
2
1
2
3
3. For each matrix A and B,
A.7 (Rank) Let A =
1
0
3
and B =
1
0
1
4
1
1
2
5
4
1
1
2
5
a. [2 points] what is its rank?
b. [2 points] what is a (minimal size) basis for its column span?
2
0
2
4
3
T
T
A.8 (Linear equations) Let A = 4
2
4
2
2
2 4
1 1 1
3
3
1
5, b =
, and c =
.
a. [1 points] What is Ac?
b. [2 points] What is the solution to the linear system Ax = b? (Show your work).
A.9 (Hyperplanes) Assume w is an n-dimensional vector and b is a scalar. A hyperplane in Rn is the set fx : x 2 Rn; s.t. wT x + b = 0g.
a. [1 points] (n = 2 example) Draw the hyperplane for w = [ 1; 2]T , b = 2? Label your axes.
b. [1 points] (n = 3 example) Draw the hyperplane for w = [1; 1; 1]T , b = 0? Label your axes.
c. [2 points] Given some x0 2 Rn, nd the squared distance to the hyperplane de ned by wT x + b = 0. In other words, solve the following optimization problem:
min x
x
k
2
x
k
0
s.t. wT x + b = 0
wT (x0
x0)
(Hint: if x0 is the minimizer of the above problem, note that kx0
x0k = j
kwk
e
j. What is wT x0?)
A.10 For
e
R
n and c
R
e
z
e
possibly non-symmetric A; B
2
n
2
, let f(x; y) = xT Ax + yT Bx + c. De ne
r
f(x; y) =
h
@f(x;y)
@f(x;y)
(x;y)
i
T
: : :
@f
.
@z1
@z2
@zn
a. [2 points] Explicitly write out the function f(x; y) in terms of the components Ai;j and Bi;j using appro-priate summations over the indices.
b. [2 points] What is rxf(x; y) in terms of the summations over indices and vector notation?
c. [2 points] What is ryf(x; y) in terms of the summations over indices and vector notation?
B.2 [1 points] The trace of a matrix is the sum of the diagonal entries; T r(A) = Pi Aii. If A 2 Rn m and
• 2 Rm n, show that T r(AB) = T r(BA).
B.3 [1 points] Let v1; : : : ; vn be a set of non-zero vectors in Rd. Let V = [v1; : : : ; vn] be the vectors concatenated.
P
a. What is the minimum and maximum rank of n v vT ?
i=1 i i
b. What is the minimum and maximum rank of V ?
c. Let A 2 RD d for D > d. What is the minimum and maximum rank of Pn (Avi)(Avi)T ?
i=1
d. What is the minimum and maximum rank of AV ? What if V is rank d? 2
Programming
A.11 For the A; b; c as de ned in Problem 8, use NumPy to compute (take a screen shot of your answer):
a. [2 points]
What is A
1?
b. [1 points]
What is A
1b? What is Ac?
A.12 [4 points]
Two random variables X and Y have equal distributions if their CDFs, FX and FY , respectively,
are equal, i.e.
for all x, jFX (x) FY (x)j = 0. The central limit theorem says that the sum of k independent,
zero-mean, variance-1=k random variables converges to a (standard) Normal distribution as k goes o to in nity. We will study this phenomenon empirically (you will use the Python packages Numpy and Matplotlib). De ne
(k)
1
k
Y
=
p
i=1 Bi where each Bi is equal to 1 and 1 with equal probability. From your solution to problem
k
know that
1
B
is zero-mean and has variance 1=k.
A.5, we
P
p
i
k
a. For i = 1; : : : ; n
let Z
i N
(0; 1). If F (x) is the true CDF from which each Z
i
is drawn (i.e., Gaussian)
n
for all x
,
1
P[(Fn(x)
F (x))2] 0:0025, and plot Fn(x) from
3 to 3.
and Fn
(x) =
n
i=1 1fZi x), use the answer to problem A.6 above to choose n large enough such that,
b
2 R
E
to generate the random variables, and
(Hint: use
q
import matplotlib.pyplot
Z=numpy.random.randn(n)
b
as plt;
b
plt.step(sorted(Z), np.arange(1,n+1)/float(n)) to plot).
b. For each k 2 f1; 8; 64; 512g generate n independent copies Y (k) and plot their empirical CDF on the same plot as part a.
(Hint: np.sum(np.sign(np.random.randn(n, k))*np.sqrt(1./k), axis=1) generates n of the Y (k) random variables.)
Be sure to always label your axes. Your plot should look something like the following (Tip: checkout seaborn for instantly better looking plots.)
3