$24
Use the data in the file problem set V.mat. It contains three variables, y train, x train, and x test. The first is a vector of outcomes (annual earnings) of length 6,390, the second and the third are 6,390 and 12,780 by 26 matrices.
The goal is to predict the outcomes for the 12,780 units in the test sample, using the information in y train, x train, and x test. The main item to hand in is a matlab data file with three sets of predictions for the 12,780 test observations. You can search for software implementing these methods on the web, and are free to use any software.
The first set of predictions should be based on a linear model, with possibly some regu-larization. Call these predictions y test lin.
The second set of predictions should be based on a tree or forest approach. Call this y test tree.
The third set should be based on a deep learning / neural net method. Call this y test deep. You can choose the number of layers and number of hidden units.
In all cases motivate the choices for the tuning parameters you make.