2008 hw3 key

From Btry4790

Jump to: navigation, search
Problem 1:
a)
mu0
-0.994372 -1.117302

sigma0
          [,1]      [,2]
[1,] 0.3081188 0.2855377
[2,] 0.2855377 0.8134664

mu1
1.0492281 0.9808596

sigma1
          [,1]      [,2]
[1,] 0.7782789 0.1968357
[2,] 0.1968357 0.2499694

pi
0.57

b) (see attached)

phi
0.60

lambda
0.93

c) (see attached)

i P(y_i=1|x_i.) P(y_i=1|x_i.)>0.5
 [1,] 1.000000e+00    1
 [2,] 1.000000e+00    1
 [3,] 1.106875e-11    0
 [4,] 1.000000e+00    1
 [5,] 1.795992e-16    0
 [6,] 1.000000e+00    1
 [7,] 1.000000e+00    1
 [8,] 1.000000e+00    1
 [9,] 1.000000e+00    1
[10,] 1.000000e+00    1
[11,] 4.113176e-11    0
[12,] 2.272105e-09    0
[13,] 4.737765e-15    0
[14,] 1.000000e+00    1
[15,] 1.000000e+00    1
[16,] 6.630721e-12    0
[17,] 1.953301e-14    0
[18,] 1.000000e+00    1
[19,] 1.000000e+00    1
[20,] 1.000000e+00    1
[21,] 2.586914e-16    0
[22,] 1.000000e+00    1
[23,] 1.000000e+00    1
[24,] 2.658593e-11    0
[25,] 1.000000e+00    1
[26,] 5.717125e-10    0
[27,] 8.619959e-15    0
[28,] 3.680921e-15    0
[29,] 1.000000e+00    1
[30,] 1.037743e-12    0
[31,] 1.000000e+00    1
[32,] 1.000000e+00    1
[33,] 1.000000e+00    1
[34,] 1.165631e-11    0
[35,] 1.000000e+00    1
[36,] 2.031293e-12    0
[37,] 1.000000e+00    1
[38,] 9.353792e-14    0
[39,] 1.799203e-13    0
[40,] 1.000000e+00    1
[41,] 1.142122e-14    0
[42,] 1.000000e+00    1
[43,] 1.000000e+00    1
[44,] 1.453942e-14    0
[45,] 1.000000e+00    1
[46,] 8.284799e-10    0
[47,] 1.000000e+00    1
[48,] 4.287281e-14    0
[49,] 1.000000e+00    1
[50,] 9.219260e-05    0

Problem 2:
a) (see attached)

mu0
-1.046320 -1.028186 

sigma0
          [,1]      [,2]
[1,] 0.3592381 0.3072740
[2,] 0.3072740 0.7519296

mu1
0.9873772 0.9965616 

sigma1
          [,1]      [,2]
[1,] 0.7194950 0.1436320
[2,] 0.1436320 0.3085462

pi 
0.5865971

The parameter estimates from EM give a better fit than the estimates from problem one (e.g. 
log-likelihood of -2571.97 as opposed to -2608.54).  Unsurprisingly, there are multiple local 
maxima corresponding to the two possible ways in which to split the data into a mixture of two 
Gaussians.  The first and last initializations converge to the mixture estimated from the original 
data subset.  The second initialization converges to the alternate way of defining the mixture.  
Also, it is apparent that the farther away from a maximum the longer it takes the algorithm to 
converge, with the rate of convergence reflecting the local structure of the marginal likelihood
surface (e.g. the steeper the surface, the faster convergence).

b) (see attached)

c) 
mu0
-0.9937212 -0.9347667

sigma0
          [,1]      [,2]
[1,] 0.4077858 0.3670646
[2,] 0.3670646 0.8593127

mu1
1.044355 1.020972

sigma1
          [,1]      [,2]
[1,] 0.6616135 0.1175831
[2,] 0.1175831 0.2945752

lambda
0.876958

phi
0.5600008

The parameter estimates from EM for model B give a better fit than the estimates from problem 
one (e.g. a log-likelihood of -2348.566 as opposed to -2407.701).  Just like in the first model, 
there are multiple ways to define the mixture that will define equivalent distributions.  We see 
this since one of the random initializations converges to the alternate way of defining the mixture 
for the lambdas (e.g. exchanging the definition of lambda and 1-lambda).  

d) see attached
i P(y_i=1|x_i.) P(y_i=1|x_i.)>0.5
 [1,] 1.000000e+00    1
 [2,] 1.000000e+00    1
 [3,] 5.820044e-10    0
 [4,] 1.000000e+00    1
 [5,] 8.265355e-13    0
 [6,] 1.000000e+00    1
 [7,] 1.000000e+00    1
 [8,] 1.000000e+00    1
 [9,] 1.000000e+00    1
[10,] 1.000000e+00    1
[11,] 1.493134e-09    0
[12,] 7.169165e-09    0
[13,] 3.154954e-12    0
[14,] 1.000000e+00    1
[15,] 9.999999e-01    1
[16,] 1.318969e-10    0
[17,] 5.924521e-12    0
[18,] 1.000000e+00    1
[19,] 1.000000e+00    1
[20,] 1.000000e+00    1
[21,] 3.321282e-13    0
[22,] 1.000000e+00    1
[23,] 9.999999e-01    1
[24,] 1.373224e-09    0
[25,] 1.000000e+00    1
[26,] 5.630016e-09    0
[27,] 5.358217e-13    0
[28,] 7.736868e-12    0
[29,] 1.000000e+00    1
[30,] 7.704926e-11    0
[31,] 1.000000e+00    1
[32,] 1.000000e+00    1
[33,] 1.000000e+00    1
[34,] 8.445471e-10    0
[35,] 1.000000e+00    1
[36,] 8.327434e-11    0
[37,] 1.000000e+00    1
[38,] 2.731032e-11    0
[39,] 3.240978e-11    0
[40,] 1.000000e+00    1
[41,] 5.551307e-12    0
[42,] 1.000000e+00    1
[43,] 1.000000e+00    1
[44,] 5.380022e-12    0
[45,] 1.000000e+00    1
[46,] 4.025614e-09    0
[47,] 1.000000e+00    1
[48,] 3.270664e-11    0
[49,] 1.000000e+00    1
[50,] 3.679760e-05    0

Problem 3:
a) Model one has 11 free parameters. Hence the AIC1 = 22 + 2*(2571.973) = 5165.946

Model two has 12 free parameters.  Hence the AIC2 = 24 + 2*(2348.566) = 4721.132

Model two definitely improves the likelihood enough to warrant the additional model complexity.

b) Model two explicitly models the overall behavior of each precinct, hence if voter preference 
is closely related to the behavior of the precinct they are in, then this model will reduce the bias 
in the estimate of voter preference, even though the parameter estimates will have greater variation 
(since there are more parameters being estimated).  Conversely, if voter preference is not related 
to the behavior of the precinct, then the additional model complexity will just increase the variance 
of the parameter estimates, and not provide any additional gain in model fit (i.e. reduce the bias of 
estimation of individual voter preference).

AS comment: I would put this slightly differently.  Model 2 allows
information about party preference to be pooled within precincts, which
will tend to reduce the variance in the estimates of the zij (we will tend
to believe more strongly that each zij is 0 or 1), but possibly at the cost
of a bias if the model is misspecified (we will be more certain but also
more wrong!).  This phenomenon is sometimes called "shrinkage".

It is true that the additional parameter will tend to increase variance in
both parameter estimates and in the zijs, especially if the model does not
fit the data well, but I think this will have a secondary effect on the
zijs compared with shrinkage.

One way to think about shrinkage in this case is to consider respondents
near the boundary between the two classes.  Under model 2, these
individuals will tend to be "pulled" in to the class in which most others
in their precinct fall, even if they would fall in the other class under
model 1.  If precinct "coherence" is high, most of the time this will be
the right thing to do, but if coherence is low it may be the wrong thing to
do.


Personal tools