$24
Question 1. [50 points]
Blood-oxygen level dependent (BOLD) responses of a neural population in human visual cortex are provided in the le hw3_data2.mat. This le contains a variable Yn that represents 1000 response samples. There is another variable Xn that represent 100 regressors that may explain the responses. For all parts, the proportion of explained variance (R2) should be calculated as the square of Pearson’s correlation coe cient between measured and predicted responses. Answer the questions below.
a) Use the ridge regression method to t regularized linear models to predict noisy BOLD responses as a weighted sum of given regressors. Perform 10-fold cross-validation to tune the ridge parameter ( 2 [0 1012]) based on model performance. (Hint: Vary the ridge parameters logarithmically.) Note that for = 0, the model obtained with ridge regression is equivalent to the OLS solution. For each cross-validation fold, do a three-way split of the data: select a validation set of 100 contiguous samples, a testing set of 100 samples (that immediately precede the validation set assuming circular symmetry), and a training set of length 800 samples. Fit a separate model for each using the training set. Find R2 of each model on the testing set. Separately estimate R2 of each model on the validation set. Plot the average R2 across cross-validation folds, measured on the testing set as a function of . Find the optimal ridge parameter opt that maximizes average R2. Find the model performance by calculating the average R2 across cross-validation folds, measured on the validation set for opt. Plot R2 curves obtained on testing and validation data for all values. Interpret your results.
b) Determine con dence intervals for parameters of the OLS model from part a (i.e., the model obtained for = 0). Generate bootstrap samples from the 1000 samples in the original data (resample both the regressors and the responses the same way). Perform 500 bootstrap iterations, and re t a separate model at each iteration. Plot the mean and 95% con dence intervals of the parameters in the same graph. Identify and label on your plots, the model regressors which have weights that are signi cantly di erent than 0 (at a signi cance level of p < 0:05).
c) Determine con dence intervals for parameters of the regularized linear model from part a (i.e., the model obtained for opt). Generate bootstrap samples from the 1000 samples in the original data (resample both the regressors and the responses the same way). Perform 500 bootstrap iterations, and re t a separate model at each iteration using opt found in part a. Plot the mean and 95% con dence intervals of the parameters in the same graph. Identify and label on your plots, the model regressors which have weights that are signi cantly di erent than 0 (at a signi cance level of p < 0:05). Compare the results to those in part b.
Question 2. [50 points]
A series of neural response measurements are provided in the le hw3_data3.mat. Answer the questions below to examine the relationship between these measurements. Provide plots whenever possible.
a) Responses from two separate populations of neurons are stored in the variables pop1 and pop2. We would like to examine whether the mean responses of the two populations are signi cantly di erent. The rst population contains 7 neurons, whereas the second population contains 5 neurons. Using the bootstrap technique (10000 iterations), nd the two-tailed p-value for the null hypothesis that the two datasets follow the same distribution. (Hint: If the two datasets come from a common distribution, is there any need to separate them?)
b) BOLD responses recorded in two voxels in the human brain are stored in the variables vox1 and vox2. We would like to examine whether the voxel responses are similar to each other, by calculating their correlation. Using the bootstrap technique (10000 iterations), nd the mean and 95% con dence interval of the correlation. Find the percentile of the bootstrap distribution, corresponding to a correlation value of 0. (Hint: Should you resample vox1 and vox2 independently or identically?)
c) Note that estimation of con dence intervals and hypothesis testing are dual problems. For the dataset examined in part b, use bootstrapping (10000 iterations) to simulate the distribution of the null hypothesis that two voxel responses have zero correlation. Find the one-tailed p-value for the two voxel responses having zero or negative correlation. Compare this to the result in part b. (Hint: Resample the datasets to break apart the correlation between them.)
d) The average BOLD responses in a face-selective region of the human brain have been recorded in two separate experiments. The responses of this region to building images (1st experiment) and face images (2nd experiment) are stored in the variables building and face for 20 subjects. Assume that the same subject population was recruited in both experiments. Use bootstrapping (10000 iterations) to calculate the two-tailed p-value for the null hypothesis that there is no di erence between the building and face responses.
e) Repeat the exercise in part d, but this time assuming that the subject populations recruited for the two experiments are distinct. Use bootstrapping (10000 iterations) to calculate the two-tailed p-value for the null hypothesis that there is no di erence between the building and face responses.