Starting from:
$29.99

$23.99

Homework set #3: Solution

1.  As we have and will encounter  Jensen’s inequality  and the geometric-mean algebraic-mean (GM-AM)

inequality  in our readings,  we will work through  the details  of these in this homework problem.

 

 

Preliminary: A subset  D of a real vector  space (e.g., Rd ) is convex (concave)  if every convex (con- cave)  linear  combination of a  pair  of points  of D  is in  D,  i.e.,  if x, y  ∈  D  and  0 < α  < 1 im- ply  that αx + (1 − α)y  ∈  D.   A function  f : D  →  R is similarly  said  to  be convex  (concave)  if f (αx + (1 − α)y) ≤ (≥)αf (x) + (1 − α)f (y).  These notions  can be extended  to linear combinations

of any finite number  of points,  with scalings αi such that Pi αi = 1.

 

 

Prove  the following.

Jensen’s inequality: Suppose the function f : D → R is a concave function.  Assume x1, x2, . . . , xn ∈

D and 0 < αi < 1 for i = 1, 2, . . . , n with Pi αi = 1. Then

 

n

X αi f (xi ) ≤ f

i=1


   n          ! X αi xi    .

i=1

 

Hints:  First  note  for the case n = 1 there  is nothing  to prove and  for n = 2 the statement follows immediately  from the definitions.   So consider  n ≥ 3 and  an  induction  argument.  That  is,  assume the statement is true  for some small n,  and show it holds for n + 1.

 

 

**When will equality hold?**

 

 

2.  Now using Jensen’s show the GM-AM inequality  holds:



1
 
Let {xi }, i = 1, 2, . . . n, be a set of n non-negative  real numbers.  Show that the following inequality holds:

   n      ! n

Y xi


  1  n      !

≤    X xi

 

i=1


n i=1

Hint:  note that  the function  f (x) = log x is concave on (0, ∞).

 

 

3.  (Prob.   29 in Ross text)  The  regression  model  Y   = βx + e, for e ∈  N (0, σ2),  is called regression through  the  origin,  as it  presupposes  that the  expected  response  corresponding  to  the  input level x = 0 is 0.

Suppose that (xi , Yi ), i = 1, . . . , n, is a data  set from this model. (a) Determine  the least squares  estimator βˆ of β.

(b) What  is the distribution of βˆ?

(c) Write  an expression for the resulting  sum-of-square-error criterion.

(d) Construct a hypothesis  test  framework for: H0  : β = β0  versus Ha : β = β0.

 

 

 

4.  (Prob.  46 in Ross text)  The following data  resulted  following a series of Stanford  heart  transplants.

This data  relates  survival time (in days) of heart  transplant recipients,  to their  age at time of trans- plant, and to a so-called mismatch  score that supposedly  indicates  fit of donor and recipient.

 

 

Survival  time       Mismatch  score  Age

 

624        

1.32       

51.0

46                           .61          42.5

64                           1.89        54.6

1,350                      .87          54.1

280                         1.12        49.5

10                           2.76        55.3

1,024                      1.13        43.4

39                           1.38        42.8

730                         .96          58.4

136                         1.62        52.0

836                         1.58        45.0

60                           .60          64.5

(a)  Let the  dependent variable  be the  logarithm  of Survival  time.  Fit  a multiple  linear  regression on the independent variables  of Mismatch  score and Age.

(b) Compute  an estimate  of the variance  of the error term.

 

 

 

5.  (Prob.    58 in  Ross  text)  Twelve  first-time  heart  attack patients were given  a test  that  measures

”internal anger”.  The following data  relates their scores, and whether  they had a second heart  attack within  5 years.

 

 

Anger Score    Second Heart  Attack

 

80                           Yes

77                           Yes

70                            No

68                           Yes

64                            No

60                           Yes

50                           Yes

46                            No

40                           Yes

35                            No

30                            No

25                           Yes

(a) Explain how the relationship between a second heart attack and one’s anger score can be analyzed via a logistic regression model.

(b)  Using a software  package  of your  choice, estimate  parameters for this  model (for example,  in

Matlab  to fit a logistic model consider the command  ‘glmfit’).

(c) Estimate the probability that a heart  attack patient with an anger score of 55 will have a second heart  attack within  5 years.

 

 

 

6.  On the course website you will find a data  file called PCAdata.mat (Matlab format),  or PCAdata.csv

(Python format).

 

(a)  For  this  data  set compute  the  SVD (singular  value decomposition) of the  original matrix,  and using this SVD discuss the expected  results  of performing  a PCA  on this data.

(b)  Compute  the  PCA:  First  compute  the  mean(s)  for the  data,   and  subtract from the  original data;  second compute  the covariance  matrix  including  the scaling 1/(n − 1); third  compute  an eigenvalue decomposition  and sort both  the eigenvalues and eigenvectors  in descending order.

(c) Plot  and  discuss the  principal  components.   Discuss how this  process and  results  might  differ from a direct  SVD of the de-biased,  scaled data.

More products