Starting from:
$30

$24

Problem Set 12 (PyTorch Introduction) Solution

Goals. The goal of this exercise is to




introduce you to the PyTorch platform. discuss the vanishing gradient problem




Problem 1 (PyTorch Getting Started):




EPFL




School of Computer and Communication Sciences




Martin Jaggi & R•udiger Urbanke




mlo.ep .ch/page-157255-en-html/




epfmlcourse@gmail.com










Tutorials. Installation instructions:




pytorch.org




We recommend using the following online tutorial:




pytorch.org/tutorials/beginner/pytorch with examples.html







Setup, data, and sample code. Obtain the folder labs/ex12 of the course github repository




github.com/epfml/ML course







Exercise 1: Torch Familiarize yourself with the basics of pytorch through the tutorial.




Exercise 2: Basic Linear Regression




Implement prediction and loss computation for linear regression in the MyLinearRegression class. Implement the gradient descent steps in the train function.




HINT: don't forget to clear the gradients computed at previous steps.




Your output should be similar to that of Fig. 1, left.




Exercise 3: NN package




Re-Implement Linear Regression using the routines from the nn package for de ning parameters and loss in the NNLinearRegression class. Does the result that you obtain di er from the previous one? If so, why?




Combine two linear layers and a non-linearity (sigmoid or ReLU) layer to build a Multi-Layer Perceptron (MLP) with one hidden layer, in the MLP class. Find the optimal hyper-parameters for training it.




Your prediction using the MLP should be non-linear, and for a hidden size of 2 might look like Fig. 1, right.



(a) Prediction with linear regression (b) Prediction with MLP




Figure 1: Predictions made by various trained models.







Problem 2 (Vanishing Gradient Problem):







Over the past few years it has become more and more popular to use \deep" neural networks, i.e., networks with a large number of layers, sometimes counting in the hundreds. Emperically such networks perform very well but they pose new challenges, in particular in the training phase. One of the most challenging aspect is what is called the \vanishing gradient" problem. This refers to the fact that the gradient with respect to the parameters of the network tends to zero typically exponentially fast in the number of layers. This is a simple consequence of the chain rule which says that for a composite function f(x) = gn(gn 1( g1(x) )) the derivative has the form




f0(x) = gn0( )gn0 1( ) g10(x):




The aim of this exercise is to explore this problem.




Consider a neural net as introduced in the course with L layers, K = 3, and the sigmoid function as activation layer. Assume that all weights wi;j(l) are bounded, lets say jwi;j(l) j 1. Consider a regression task where the output layer has a single node representing a simple linear function with some bounded weights ci, lets say jcij 1. Hence the overall function, call it f is a scalar function on RD. Show that the derivative of this function with respect to the weight w1(1);1 vanishes exponentially fast as a function of L at a rate of at least ( 34 )L.



























































































2

More products