$24
You may conduct these exercises using just the rst rst 20,000 reviews from the corpus. Use the rst half for training and the rest for testing. Process reviews without capitalization or punctuation (and without using stemming or removing stopwords).
1. Build a sentiment analysis model that estimates star ratings from a 1,000 word bag-of-words model (based on the most popular words). Compare models based on:
(a) the 1,000 most common unigrams;
(b) the 1,000 most common bigrams;
(c) a model which uses a combination of unigrams and bigrams (i.e., some bigrams will be included if they are more popular than some unigrams, but the model dimensionality will still be 1,000).
You may use a Ridge regression model (sklearn.linear model.Ridge) with a regularization coe cient of = 1). Report the MSE on the test set for each of the three variants, along with the ve most positive and most negative tokens for each variant (2 marks).
2. Which review has the highest cosine similarity compared to the rst review in the dataset, in terms of their tf-idf representations (using only the training set, and considering unigrams only)? Provide the ID and text of the review (2 marks).
Tasks: Content, Structure, and Sequences
For these tasks, consider the complete dataset, but discard any users who have fewer than three interactions. Build a training set by considering all but the last interaction from each user (in order of timestamp). The test set will then contain each user’s most recent interaction.
3. Using the fastFM library1 compare three model variants to predict ratings:2
(a) A regular latent-factor model, i.e., one which includes a user term and an item term (f(u; i));
(b) A non-personalized Markov Chain model, i.e., one which combines a one-hot encoding of the previous item with a one-hot encoding of the current item (f(i; i(prev))).
(c) A model which includes both the user, the item, and the previous item (f(u; i; i(prev))).
Report the training and test MSE for each variant; you may follow the same hyperparameter settings as in the starter code (3 marks).3
1Or adapting the FPMC Tensor ow code from Chapter 7.
2You may also perform ranking if you prefer, depending on which code you’re adapting; e.g. the Tensor ow code implements a Bayesian Personalized Ranking-style model. If using the Tensor ow code you can report and estimate the AUC by measuring how often the ‘predict’ function returns a positive score (positive item has a higher score than the negative item). Please just speci c what library you used and how you approached the problem.
• Don’t be too surprised if the Markov Chain model doesn’t work well for this task|not every dataset is ‘Markovian’ !
1
4. Experiment with the factorization machine code to see whether performance of the above models can be improved by incorporating any other features from the data (1 mark).4 If you can’t get any FM library working, you may skip this question and simply describe what features you might try.
4Though ideally, you should avoid using the review text, which wouldn’t be available at test time in a real application!
2