Assignment 4 Solution

Starting from:

~~$30~~

$24

Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF les only).

1. Consider the model

Yi = i + "i (i = 1; ; n)

where f"ig is a sequence of random variables with mean 0 and nite variance representing

noise.
We will assume that 1; ; n are dependent or \smooth" in the sense that the
absolute di erences fj i
i
1jg are small for most values of i. Rather than penalizing a lack
of smoothness by
(
2
1 + i
2)2 (as in Assignment #2), we will consider estimating
f

g

f
g
minimizingi

i

given data

yPii byi

n

n

(yi
i)2 +
j i i 1j
(1)

Xi

X

=1

i=2

n
i 1j represents the total variation of f ig.
where 0 is a tuning parameter and X j i
i=2

The resulting estimates b1; ; bn, are sometimes called fusion estimates and are useful if f ig contain \jumps", that is, i = g(i=n) where g is a smooth function with a small number of discontinuities (i.e. jumps).

The non-di erentiable part of the objective function in (1) can be made separable by de ning i = i i 1 for i = 2; ; n and then minimizing

n
n

(yi i)2 +
j ij
(2)
Xi
X

=1
i=2

where now each i (for i = 2; ; n) will be a function of 1; 2; ; i. The representation of the objective function in (2) can be used to compute the parameter estimates using coordinate descent although there is must faster algorithm. However, (2) is useful for deriving properties of the estimates.

Show that k = 1 + Pki=2 i for k 2.

Show that if b1; ; bn minimize (1) (or (2)) then

n

X

(yi bi) = 0:

i=1

(Hint: Use the representation (2) and compute its partial derivative with respect to 1.)

1

(c) Show that jyi bijfor all i.n
(Hint: Show that
@
( j i i 1j) [ 2 ; 2 ]n

Xi

=2

for any 1; ; n.)

(d) For su ciently large, we will have b1 = = bn = y or equivalently b2 = = bn = 0. How large must be in order to have b1 = = bn = y? (Hint: Look at the sub-gradient of (2) with respect to ( 1; 2; ; n); when is (0; 0; ; 0) an element of this sub-gradient at (y; 0; ; 0)?)

Note: An R function tvsmooth is available on Quercus in a le tvsmooth.txt. You may nd it useful to simulate data from a discontinous function with additive noise and estimate the function using tvsmooth to gain some insight into this method.

2. Suppose that X1; ; Xn are sampled from the following truncated Poisson distribution:

P (Xi = x) =
exp(
) x

for x = r + 1; r + 2;

x! (r)
for some integer r 0 where

(r) =
1

exp(
) x
= 1
r
exp( ) x :

X

X

x=r+1

x!

x=0

x!

Such a sample might arise if we were sampling from a Poisson population but were unable to observe data less than or equal to r.

The EM algorithm can be employed to estimate from the observed X1; ; Xn. The key is to think of the observed data as a subset of some larger (\complete") data set X1; ; Xn; Xn+1; ; Xn+M where M 0 is a random variable and Xn+1; ; Xn+M r; given M = m, this complete data set is now assumed to be m + n independent observations from a Poisson distribution with mean . The log-likelihood for the complete data is

n+m

ln L( ) = ln( )
Xi

xi
(n + m) ;

=1

n+m

which depends on two unknowns
i=X
xi and m. To use the EM algorithm, we need to

n+1

estimate these two unknowns.

(a) The probability distribution of M is
(r))m (r)n for m = 0; 1; 2;
P (M = m) = n + m
1
!(1
m

2

Show that E (M) = n(1
(r))= (r).

(b) Show that

E 0 n+M
Xi

X1
= x1;

; Xn = xn
1 = E (M)E (Xi Xi

r):
i=n+1

j

X

A

@

(Hint: Note that (a) Xn+1; ; Xn+M are independent of X1; ; Xn and (b) Xn+1; ; Xn+M r.)

Consider the data given in the table below. They represent the accident claims submitted to La Royale Belge Insurance Company during a single year. A crude model for the number of claims submitted for a given policy is Poisson. However, the data below does not provide the number of policies for which no claims were submitted. We want to estimate as well as to impute (estimate) the number M of policies with no claims.

Number of claims
1
2
3
4
5
6
7

Number of policies
1317
239
42
14
4
4
1

Assume a truncated Poisson model for these data taking r = 0 and estimate as well as M using the EM algorithm (which in this case has a particularly simple form). Do you think the truncated Poisson model is useful for these data? (For example, do you think your estimate of M is reasonable?)

3