$29
This assignment focuses on convolutional neural networks. You will need to implement convolutional neural network models for two tasks: document classi cation and sentimental analysis.
Document Classi cation (50 points) Use the same datasets as Assignment 1. Classify text paragraphs into three categories: Fyodor Dostoyevsky, Arthur Conan Doyle, and Jane Austen by building your own classi ers. The data provided is from Project Gutenberg.
(10 pts) Preprocess the data: build the vocabulary, tokenize, etc. Divide the data into train, validation, and test.
(10 pts) Initialize parameters for the model. Implement the forward pass for the model. Use an embedding layer as the rst layer of your network (e.g. tf.nn.embedding lookup). Set zero paddings to the input matrix. Use at least two convolutional layers (each layer includes convolution, activation, and maxpooling).
(10 pts) Choose and report the number of lters and the lter size for your CNN.
(10 pts) Calculate the loss of the model (cross-entropy loss is suggested). Set up the training step:
use a learning rate of 1e 3 and an Adam optimizer.
(10 pts) Train you model and report the recall and precision of each class on test data. Tune the parameters to achieve the best performance you can.
Sentiment Analysis (50 points)
This is a multi-domain sentiment dataset with positive or negative sentiment annotations. We only use the book reviews for this assignment. There are 1000 positive book reviews and 1000 negative book reviews.
(10 pts) Preprocess the data: extract the review text from <review text, build the vocabulary, tokenize, etc. Divide the data into train, validation, and test.
(10 pts) Initialize parameters for the model. Implement the forward pass for the model. Use an embedding layer as the rst layer of your network (e.g. tf.nn.embedding lookup). Set zero paddings to the input matrix. Use at least two convolutional layers (each layer includes convolution, activation, and maxpooling).
(10 pts) Choose and report the number of lters and the lter size for your CNN.
(10 pts) Calculate the loss of the model (binary cross-entropy loss is suggested). Choose appropriate output function. Set up the training step including learning rate and optimizer.
(10 pts) Train you model and report the accuracy of each class and the total accuracy on test data. Tune the parameters to achieve the best performance you can.
Submission Instructions You shall submit a zip le named Assignment4 LastName FirstName.zip which contains: (Those who do not follow this naming policy will receive penalty points)
python les (.py) including all the code, comments and results. You need to provide detailed comments in English.
report(.pdf) for each task: Describe your model: size of the training set and validation set, parameters for your model, number of lters, lter size for you CNN model, loss function, learning rate, optimizer, etc. Plot for training and validation loss. Report recall and precision for task 1, and accuracy score for task 2 on test data.
Further Reading:
Yoon Kim. Convolutional Neural Networks for Sentence Classi cation. ACL 2014. arXiv:1408.5882 Ye Zhang, Byron Wallace. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural
Networks for Sentence Classi cation. arXiv:1510.03820
Page 2